1
00:00:02,840 --> 00:00:05,205
Hey. In this video,

2
00:00:05,205 --> 00:00:09,570
we will briefly discuss what will be covered during the next weeks.

3
00:00:09,570 --> 00:00:14,700
So, during this week we will discuss text classification tasks.

4
00:00:14,700 --> 00:00:19,075
So these are the tasks that are very popular in any applications.

5
00:00:19,075 --> 00:00:23,135
For example, you need to predict sentiment for some reviews.

6
00:00:23,135 --> 00:00:26,355
You need to know whether the review is positive or negative.

7
00:00:26,355 --> 00:00:31,330
Or, you need to filter spam in your emails or something else.

8
00:00:31,330 --> 00:00:36,443
So, what you do is actually you represent your text as a bag of words,

9
00:00:36,443 --> 00:00:40,155
you compute some nice features and you apply some machine

10
00:00:40,155 --> 00:00:45,315
learning algorithm to predict the class of this text.

11
00:00:45,315 --> 00:00:50,085
There are actually lots of practical tips that you need to know to succeed in this task.

12
00:00:50,085 --> 00:00:53,740
So, during this week my colleague will tell you about it.

13
00:00:53,740 --> 00:00:56,575
Now, the next week will be about

14
00:00:56,575 --> 00:01:01,075
representing text not as a bag of words but as a sequence.

15
00:01:01,075 --> 00:01:07,135
So, what can you do when you represent a text as a sequence of words?

16
00:01:07,135 --> 00:01:09,340
One task would be language modeling.

17
00:01:09,340 --> 00:01:12,300
So language models are about predicting

18
00:01:12,300 --> 00:01:17,080
the probabilities of the next words given some previous words.

19
00:01:17,080 --> 00:01:20,395
So, this can be used to do text generation.

20
00:01:20,395 --> 00:01:23,320
And this is useful in many applications.

21
00:01:23,320 --> 00:01:25,870
For example, if you do machine translation,

22
00:01:25,870 --> 00:01:27,910
you are given some sequence of words,

23
00:01:27,910 --> 00:01:31,625
some sentence on English and then you need to translate it,

24
00:01:31,625 --> 00:01:33,126
let's say to Russian,

25
00:01:33,126 --> 00:01:39,260
so you need to generate some Russian text and that is where you'll need language model.

26
00:01:39,260 --> 00:01:43,530
Now, another important task is called sequence tagging.

27
00:01:43,530 --> 00:01:47,340
So this is the task when you have a sequence of words and you

28
00:01:47,340 --> 00:01:51,570
need to predict text for each of the words in this sequence.

29
00:01:51,570 --> 00:01:54,735
For example, it could be part-of-speech texts

30
00:01:54,735 --> 00:01:57,945
so you need to know that some words are nouns,

31
00:01:57,945 --> 00:02:00,441
some words are verbs and so on.

32
00:02:00,441 --> 00:02:07,015
Another task would be to find named entities and this is really useful.

33
00:02:07,015 --> 00:02:11,245
For example, you can find some names of the cities and use them

34
00:02:11,245 --> 00:02:15,998
as features for your previous task for text classification.

35
00:02:15,998 --> 00:02:18,350
Now, another task which is called semantic

36
00:02:18,350 --> 00:02:21,995
slot filling has been just covered in our previous video.

37
00:02:21,995 --> 00:02:24,415
So this is about some slots.

38
00:02:24,415 --> 00:02:28,250
For example, you need to pass a query and you need to know that

39
00:02:28,250 --> 00:02:34,280
the person wants to book a table for some specific time in some specific place.

40
00:02:34,280 --> 00:02:37,460
And those time and place would be the slots for you.

41
00:02:37,460 --> 00:02:41,915
Now, we can do something even more complicated

42
00:02:41,915 --> 00:02:47,323
and try to understand the meaning of words or some pieces of text.

43
00:02:47,323 --> 00:02:49,685
How do we represent the meaning?

44
00:02:49,685 --> 00:02:54,110
Well, one easy way to do this would be to use vectors.

45
00:02:54,110 --> 00:02:58,780
So, you map all the words to some vectors.

46
00:02:58,780 --> 00:03:02,240
Let's say 300 dimensional vectors of

47
00:03:02,240 --> 00:03:07,625
some float numbers and these vectors will have really nice properties.

48
00:03:07,625 --> 00:03:11,645
So, similar words will have similar vectors.

49
00:03:11,645 --> 00:03:17,525
For example, this nice picture tells you that Cappuccino and Espresso are the same thing

50
00:03:17,525 --> 00:03:23,946
just because the cosine similarity between the vectors is really high.

51
00:03:23,946 --> 00:03:27,240
Now, we will also discuss topic models.

52
00:03:27,240 --> 00:03:32,020
Topic models deal with documents as a whole and they also represent them by

53
00:03:32,020 --> 00:03:37,120
some vectors that can tell you what are the topics in these certain documents.

54
00:03:37,120 --> 00:03:40,465
This is really useful when you need to, for example,

55
00:03:40,465 --> 00:03:45,130
describe the topics of a big data set like Wikipedia or

56
00:03:45,130 --> 00:03:51,985
some news flows or social networks or any other text data that you are interested in.

57
00:03:51,985 --> 00:03:57,340
Now, this is just another example of how those methods can be used.

58
00:03:57,340 --> 00:04:01,465
So, let's say that we could represent our words by vectors

59
00:04:01,465 --> 00:04:07,030
just by three dimensional vectors and we have them depicted here in this space.

60
00:04:07,030 --> 00:04:09,920
And we know the similarity between them.

61
00:04:09,920 --> 00:04:13,155
So, we know the distance between those blue dots.

62
00:04:13,155 --> 00:04:15,255
Once we know these distances,

63
00:04:15,255 --> 00:04:18,650
we can create a similarity graph for words.

64
00:04:18,650 --> 00:04:20,365
So, in the middle picture,

65
00:04:20,365 --> 00:04:22,175
the nodes are words,

66
00:04:22,175 --> 00:04:27,210
and the edges have the similarities between the nodes.

67
00:04:27,210 --> 00:04:30,410
Now, this graph is actually very useful.

68
00:04:30,410 --> 00:04:35,595
Why? Because when you have some labels for some nodes of this graph, for example,

69
00:04:35,595 --> 00:04:38,760
if you know that "Laugh" has the label

70
00:04:38,760 --> 00:04:43,665
"Funny," you can try to propagate these labels through the graph.

71
00:04:43,665 --> 00:04:48,315
So those words that are similar will get the same labels.

72
00:04:48,315 --> 00:04:52,110
For example, the word "Haha" there will also get the label

73
00:04:52,110 --> 00:04:56,418
"Funny" because it is similar to the word "Laugh."

74
00:04:56,418 --> 00:05:02,850
Okay. This can be used in many different applications and we'll cover this in week three.

75
00:05:02,850 --> 00:05:06,240
Now, the next one will be more advanced

76
00:05:06,240 --> 00:05:10,225
and this week will be about sequence to sequence tasks.

77
00:05:10,225 --> 00:05:17,885
Actually, nearly any task in NLP can be somehow stated as a sequence to sequence tasks.

78
00:05:17,885 --> 00:05:19,800
Just to give you a few examples,

79
00:05:19,800 --> 00:05:22,110
it would be about machine translation.

80
00:05:22,110 --> 00:05:23,880
So there, obviously, you have

81
00:05:23,880 --> 00:05:27,960
one sentence and you need to translate it to the other sentence.

82
00:05:27,960 --> 00:05:30,240
So, these are the two sequences.

83
00:05:30,240 --> 00:05:32,565
But, for example, in summarization,

84
00:05:32,565 --> 00:05:35,400
you have the big document as an input.

85
00:05:35,400 --> 00:05:39,863
This is some long sequence and you need to produce some short summary,

86
00:05:39,863 --> 00:05:41,705
and this is also a sequence.

87
00:05:41,705 --> 00:05:44,610
You get this task in speech recognition or in

88
00:05:44,610 --> 00:05:48,925
conversational chat-bot where you have some questions and answers.

89
00:05:48,925 --> 00:05:52,275
Right? All these tasks can be nicely solved with

90
00:05:52,275 --> 00:05:56,295
so-called encoder-decoder architecture in neural networks.

91
00:05:56,295 --> 00:05:58,510
Let us see just the idea.

92
00:05:58,510 --> 00:06:02,445
So, we're given a sentence and we have an encoder.

93
00:06:02,445 --> 00:06:05,720
Now we feed this sentence to the encoder.

94
00:06:05,720 --> 00:06:11,015
What we get is some hidden representation of the input sentence.

95
00:06:11,015 --> 00:06:15,685
After that, the decoder generates the output sentence.

96
00:06:15,685 --> 00:06:19,300
So, this is how we get our final translation,

97
00:06:19,300 --> 00:06:21,200
or summary, or something else.

98
00:06:21,200 --> 00:06:25,355
Now, during the last week of our course,

99
00:06:25,355 --> 00:06:30,375
we will combine all the knowledge that we have to build a dialogue system.

100
00:06:30,375 --> 00:06:35,895
So, dialogue systems can be different and there are at least two important types.

101
00:06:35,895 --> 00:06:41,910
One type would be goal-oriented agents that try to solve some particular task.

102
00:06:41,910 --> 00:06:45,050
For example, they can assist you in a bank,

103
00:06:45,050 --> 00:06:49,285
or help you with online shopping, or something like that.

104
00:06:49,285 --> 00:06:52,745
On the contrary, there are also conversational, entertaining,

105
00:06:52,745 --> 00:06:57,890
chat-bots that just wants to somehow hold a conversation with you.

106
00:06:57,890 --> 00:07:01,310
So, there are different types of methods to be used in

107
00:07:01,310 --> 00:07:05,900
these types of tasks and we will cover them in details during the last week.

108
00:07:05,900 --> 00:07:08,360
And the project will be about

109
00:07:08,360 --> 00:07:13,040
stack overflow chat-bot that tries to assist with the search.

110
00:07:13,040 --> 00:07:18,180
So, stay with us and we will discuss everything in many details.