1
00:00:00,000 --> 00:00:03,551
[MUSIC]

2
00:00:03,551 --> 00:00:07,890
Hi, in this video I want to overview
what we have done this week.

3
00:00:09,210 --> 00:00:13,889
We have overviewed so-called
task-oriented dialog systems.

4
00:00:13,889 --> 00:00:16,660
And our dialog system
looks like the following.

5
00:00:16,660 --> 00:00:21,310
We get the speech from the user and
we can convert it to text using ASR.

6
00:00:21,310 --> 00:00:24,280
Or we can get text like in chat bots.

7
00:00:24,280 --> 00:00:28,070
Then comes Natural Language Understanding
that gives us intents and

8
00:00:28,070 --> 00:00:31,320
slots from that natural language.

9
00:00:31,320 --> 00:00:36,770
Then, there is a magic box called Dialog
Manager, and it actually does two things.

10
00:00:36,770 --> 00:00:39,310
It tracks the dialog state and

11
00:00:39,310 --> 00:00:44,260
it learns the dialog policy, what should
be done and what the user actually wants.

12
00:00:44,260 --> 00:00:50,250
The Dialog Manager can query a backend
like Google Maps or Yelp or any other.

13
00:00:50,250 --> 00:00:52,830
And then it cast to say
something to the user.

14
00:00:52,830 --> 00:00:56,130
And we need to convert the text
from Dialogue Manager to speech

15
00:00:56,130 --> 00:00:58,640
with some Natural Language Generation.

16
00:00:58,640 --> 00:01:03,880
The red boxes here are the parts of
the system that we don't overview because

17
00:01:03,880 --> 00:01:05,370
it will take a lot of time.

18
00:01:05,370 --> 00:01:08,520
And it can actually work
without those systems.

19
00:01:08,520 --> 00:01:13,500
It can take the user input as text,
so you will not need ASR.

20
00:01:13,500 --> 00:01:20,700
Then you can output your response
to the user as a text as well.

21
00:01:20,700 --> 00:01:23,340
So we don't need
Natural Language Generation.

22
00:01:23,340 --> 00:01:30,080
And sometimes you don't need Backend
action to solve the user's task.

23
00:01:30,080 --> 00:01:34,700
We have overviewed in details Natural
Language Understanding and Dialog Manager.

24
00:01:35,700 --> 00:01:40,191
And let me remind you, you can train
slot tagger and intent classifier,

25
00:01:40,191 --> 00:01:41,816
which are basically NLU.

26
00:01:41,816 --> 00:01:46,300
And you can train them separately or
jointly.

27
00:01:46,300 --> 00:01:50,760
And when you do that jointly,
that yields better results.

28
00:01:50,760 --> 00:01:55,520
You can train NLU and Dialogue Manager
separately or jointly, and

29
00:01:55,520 --> 00:01:58,040
it will give you better results as well.

30
00:01:58,040 --> 00:02:00,690
You can use hand-crafted rules sometimes.

31
00:02:00,690 --> 00:02:03,075
For example, for
dialog policy over state tracking.

32
00:02:03,075 --> 00:02:07,980
But learning from data actually works
better if you have time for that.

33
00:02:10,074 --> 00:02:14,350
Let me remind you how we evaluate NLU and
Dialog Manager.

34
00:02:14,350 --> 00:02:19,509
For NLU, we use turn-level metrics
like intent accuracy and slots F1.

35
00:02:19,509 --> 00:02:22,840
For Dialogue Manager,
there are two kinds of metrics.

36
00:02:22,840 --> 00:02:25,650
The first is turn-level metrics.

37
00:02:25,650 --> 00:02:29,756
That means that after every
turn in the dialogue,

38
00:02:29,756 --> 00:02:34,561
we track let's say,
state accuracy or policy accuracy.

39
00:02:34,561 --> 00:02:39,086
And they're are dialog-level
metrics like success rate,

40
00:02:39,086 --> 00:02:43,253
whether this dialog solved
the problem of a user or not or

41
00:02:43,253 --> 00:02:47,611
what reward we got when we
solved that problem of the user.

42
00:02:47,611 --> 00:02:52,577
The reward could be the number of turns
and we want to minimize that turns, so

43
00:02:52,577 --> 00:02:55,460
that we solve that task for
the user faster.

44
00:02:57,770 --> 00:03:00,590
And here, actually, is the question.

45
00:03:00,590 --> 00:03:02,780
We have NLU and Dialogue Manager.

46
00:03:02,780 --> 00:03:06,216
And if we train them separately,

47
00:03:06,216 --> 00:03:11,248
we want to understand how
the errors of NLU affect

48
00:03:11,248 --> 00:03:15,558
the final quality of our Dialog Manager.

49
00:03:15,558 --> 00:03:21,155
Here, on the left, on the vertical axis,

50
00:03:21,155 --> 00:03:24,115
we have success rate.

51
00:03:24,115 --> 00:03:25,916
And on the right, on the same axis,

52
00:03:25,916 --> 00:03:29,020
we have average number of
turns in the dialogue.

53
00:03:29,020 --> 00:03:33,080
And we have three colors in the legend.

54
00:03:33,080 --> 00:03:37,510
The blue one is when we
don't have any NLU errors.

55
00:03:37,510 --> 00:03:41,891
The green one is when we have
10% of the errors in NLU and

56
00:03:41,891 --> 00:03:45,833
a red one is when we have
20% of errors in our NLU.

57
00:03:45,833 --> 00:03:47,770
And you can see what happens.

58
00:03:47,770 --> 00:03:49,986
When you have a huge error in NLU,

59
00:03:49,986 --> 00:03:53,830
the success rate of your
task actually decreases.

60
00:03:53,830 --> 00:03:59,770
And the number of turns needed to solve
that task where there was a success,

61
00:03:59,770 --> 00:04:01,060
actually increases.

62
00:04:01,060 --> 00:04:05,610
So it takes more time for
the user to solve his task and

63
00:04:05,610 --> 00:04:08,130
the chance of solving that task is lower.

64
00:04:10,180 --> 00:04:14,830
But NLU actually consists of
intent classifier and slot tagger.

65
00:04:14,830 --> 00:04:18,914
So let's see which one is more important.

66
00:04:18,914 --> 00:04:24,060
Let's look what happens when we
change the Intent Error Rate.

67
00:04:24,060 --> 00:04:26,670
It looks like it doesn't
effect the quality,

68
00:04:26,670 --> 00:04:29,070
the success rate of our
dialogue that much.

69
00:04:29,070 --> 00:04:32,020
And the dialogues don't
become that much longer.

70
00:04:32,020 --> 00:04:37,240
So it looks like intent error is not as

71
00:04:37,240 --> 00:04:42,090
important as slot tagging,
and we will see now why.

72
00:04:42,090 --> 00:04:47,430
Because when you introduce the same
amount of error in slot tagging,

73
00:04:47,430 --> 00:04:52,110
that actually decreases your success
rate of the dialogue dramatically.

74
00:04:52,110 --> 00:04:57,450
And it seems that slot tagging
error is actually the main

75
00:04:59,070 --> 00:05:02,820
problem of our success rate.

76
00:05:02,820 --> 00:05:06,010
So it looks like we need to
concentrate on slot tagger.

77
00:05:06,010 --> 00:05:09,630
And that can give you some insight
when you want to train a joint model.

78
00:05:09,630 --> 00:05:14,390
When you have a loss for intent and
a loss for slot tagging.

79
00:05:14,390 --> 00:05:16,830
You can actually come up with
some weights for them so

80
00:05:16,830 --> 00:05:19,710
that the intuition isn't following.

81
00:05:19,710 --> 00:05:24,730
It seems like a slot tagging
loss should have a bigger weight

82
00:05:24,730 --> 00:05:29,920
because it is more important for
the success of the whole dialogue.

83
00:05:29,920 --> 00:05:36,680
Let me summarize, we have overviewed how
test-oriented dialogue system looks like.

84
00:05:36,680 --> 00:05:41,220
And we have overviewed in-depth NLU
component and Dialog Manager component.

85
00:05:42,470 --> 00:05:46,060
So this is the basic knowledge
that you will need to

86
00:05:46,060 --> 00:05:49,450
build your own task-oriented
dialog system.

87
00:05:49,450 --> 00:05:53,409
So that's it for this week, I wish you
good luck with your final project.

88
00:05:53,409 --> 00:06:03,409
[MUSIC]