1
00:00:00,000 --> 00:00:03,819
[MUSIC]

2
00:00:03,819 --> 00:00:09,168
In this video, I want to remind you that
NLP area is not only about mathematics but

3
00:00:09,168 --> 00:00:14,620
it also about linguistics, and
it is really important to remember it.

4
00:00:14,620 --> 00:00:17,320
So the first slide will be about this

5
00:00:18,380 --> 00:00:22,590
picture that is really very popular
in many introductions to NLP.

6
00:00:22,590 --> 00:00:25,830
But I think that we also
need to briefly cover it.

7
00:00:25,830 --> 00:00:29,000
So let us say that we
are given some sentence.

8
00:00:29,000 --> 00:00:32,560
There are different stages of analysis for
that sentence.

9
00:00:32,560 --> 00:00:33,690
The first stage,

10
00:00:33,690 --> 00:00:39,410
which is called morphological stage,
would be about different forms of words.

11
00:00:39,410 --> 00:00:44,220
For example, we care about part of speech
text, we care about different cases and

12
00:00:44,220 --> 00:00:46,330
genders and tenses.

13
00:00:46,330 --> 00:00:52,160
So this is everything that goes just for
single words in the sentence.

14
00:00:52,160 --> 00:00:55,350
Then the next stage, syntactical analysis,

15
00:00:55,350 --> 00:00:59,950
will be about different relations
between words in the sentence.

16
00:00:59,950 --> 00:01:05,180
For example, we can know that there
are some objects and subjects and so on.

17
00:01:06,280 --> 00:01:07,960
Now the next stage,

18
00:01:07,960 --> 00:01:12,680
once we know some synthetic structures,
would be about semantics.

19
00:01:12,680 --> 00:01:15,650
So semantics is about the meaning.

20
00:01:15,650 --> 00:01:19,566
So you see, we are going higher and
higher in our abstraction,

21
00:01:19,566 --> 00:01:23,708
going from just some
symbols to some meanings.

22
00:01:23,708 --> 00:01:27,970
And to be pragmatics would be
the highest level of this abstraction.

23
00:01:29,130 --> 00:01:33,899
Now, one reason why we do not
cover all this building blocks in

24
00:01:33,899 --> 00:01:38,485
many details later in our course
is that you can just use some

25
00:01:38,485 --> 00:01:43,480
very nice log books implementations for
low level stages.

26
00:01:43,480 --> 00:01:48,120
For example for morphological and
syntactical analysis, you might try

27
00:01:48,120 --> 00:01:53,840
using analytical library which is
a really convenient tool in Python.

28
00:01:53,840 --> 00:01:57,210
So please feel free to investigate it.

29
00:01:57,210 --> 00:02:01,010
And another thing that I wanted
to mention is Stanford parser.

30
00:02:01,010 --> 00:02:06,090
It is a parser for synthetic analysis
that provides different options and

31
00:02:06,090 --> 00:02:08,790
has really lots of
different models built in.

32
00:02:09,800 --> 00:02:15,760
Now Gensim and MALLET would be
about more high level abstractions.

33
00:02:15,760 --> 00:02:19,430
For example, you can do
subclassification problems there or

34
00:02:19,430 --> 00:02:21,630
you can think about semantics.

35
00:02:21,630 --> 00:02:23,660
So you have there topic models and

36
00:02:23,660 --> 00:02:28,280
some word embeddings representations that
we will discuss later in week three.

37
00:02:29,325 --> 00:02:34,239
Now, another thing which also
comes from linguistic part of our

38
00:02:34,239 --> 00:02:39,320
area is different types of
relations between the words.

39
00:02:39,320 --> 00:02:43,770
And linguists know really a lot
about what could be that types.

40
00:02:43,770 --> 00:02:47,880
And this knowledge can be found
in some extrinsic resources.

41
00:02:47,880 --> 00:02:52,210
For example, WordNet is a resource
that tells you that there are, for

42
00:02:52,210 --> 00:02:55,270
example, some hierarchical relationships.

43
00:02:55,270 --> 00:02:57,360
Like, we have some fruits, and

44
00:02:57,360 --> 00:03:02,490
then some particular types of fruits
like peach, apple, orange, and so on.

45
00:03:02,490 --> 00:03:06,820
So this relation would be
called hyponym and hypernym.

46
00:03:06,820 --> 00:03:12,160
And there are also some other
relationships like part and the whole.

47
00:03:12,160 --> 00:03:15,030
For example, you have a wheel and a car.

48
00:03:15,030 --> 00:03:19,130
So this type of relationship
is called meronyms.

49
00:03:19,130 --> 00:03:24,840
Now this type of relationships can
be found in the WordNet resource.

50
00:03:24,840 --> 00:03:29,570
Here in this slide, I have a picture
of another resource, BabelNet.

51
00:03:29,570 --> 00:03:32,560
The BabelNet resource is multilingual, so

52
00:03:32,560 --> 00:03:36,290
you can find some concepts in
different languages there.

53
00:03:36,290 --> 00:03:41,140
And what is nice, you have some
relations between these concepts.

54
00:03:41,140 --> 00:03:44,767
So for example,
I just typed in NOP there and

55
00:03:44,767 --> 00:03:48,405
then I have seen part
of speech taking test.

56
00:03:48,405 --> 00:03:50,188
I clicked into this test and

57
00:03:50,188 --> 00:03:54,910
I could see some nearest neighbors
in this space of concepts.

58
00:03:54,910 --> 00:03:58,065
For example,
I can see that the Viterbi algorithm and

59
00:03:58,065 --> 00:04:01,880
Baum-Welch algorithm
are somewhere close by.

60
00:04:01,880 --> 00:04:04,590
And after Week two of our course,

61
00:04:04,590 --> 00:04:08,550
you'll know that they are indeed
very related to this task.

62
00:04:08,550 --> 00:04:13,401
So the takeaway from this slide would
be to remember that there are some

63
00:04:13,401 --> 00:04:18,020
extrinsic resources that can be
nicely used in our applications.

64
00:04:19,172 --> 00:04:22,620
For example, how can they be used?

65
00:04:22,620 --> 00:04:24,500
This is a rather complicated task.

66
00:04:24,500 --> 00:04:30,160
It is called reasoning, and it says that
there is some story in a natural language.

67
00:04:30,160 --> 00:04:34,930
For example, Mary got the football,
she went to the kitchen,

68
00:04:34,930 --> 00:04:36,294
she left the ball there.

69
00:04:36,294 --> 00:04:38,420
Okay, so we have some story,

70
00:04:38,420 --> 00:04:42,950
and now we have a question after this
story, where is the football now?

71
00:04:44,150 --> 00:04:45,990
So to answer this question,

72
00:04:45,990 --> 00:04:50,890
the machine needs to somehow
understand something, right.

73
00:04:50,890 --> 00:04:55,880
And the way that we can build this
system would be based on deep learning.

74
00:04:55,880 --> 00:04:59,970
So you might have heard
about LSTM networks,

75
00:04:59,970 --> 00:05:03,330
it is a particular type of
recurrent neural networks.

76
00:05:03,330 --> 00:05:08,725
But here, you see that you have not
only the sequential transition edges

77
00:05:08,725 --> 00:05:13,696
in your data representation, but
you have also some other edges.

78
00:05:13,696 --> 00:05:19,355
Those red ages,
tell you about coreference.

79
00:05:19,355 --> 00:05:24,500
Coreference is another linguistic
type of relation between the words

80
00:05:24,500 --> 00:05:28,760
that says like she is the same as Mary,
right?

81
00:05:28,760 --> 00:05:32,520
So she is just a substitute for the Mary.

82
00:05:32,520 --> 00:05:37,370
And for example, this and that football
is the same ball, just mentioned twice.

83
00:05:38,570 --> 00:05:43,252
The green think is about hypernym
relationship that I have briefly

84
00:05:43,252 --> 00:05:44,640
mentioned.

85
00:05:44,640 --> 00:05:49,078
So the football is a particular
type of the balls, right.

86
00:05:50,250 --> 00:05:54,460
So once we know that our words
have some relationships,

87
00:05:54,460 --> 00:05:58,810
we can add some additional
edges to our data structure.

88
00:05:58,810 --> 00:06:03,262
And after that, we can have so
called DAG-LSTM,

89
00:06:03,262 --> 00:06:11,080
which is dynamic acyclic graph-LSTM that
will try to utilize these edges, okay?

90
00:06:11,080 --> 00:06:15,050
So I'm not going now to
cover this DAG-LSTM model.

91
00:06:15,050 --> 00:06:19,760
I just want you to see that there is
a way to use the linguistic knowledge

92
00:06:19,760 --> 00:06:23,990
to our needs here and to improve
the performance of some particular

93
00:06:23,990 --> 00:06:26,860
question answering task, for example.

94
00:06:26,860 --> 00:06:27,840
In the rest of the video,

95
00:06:27,840 --> 00:06:33,770
I want to cover another example of
linguistic information used in the system.

96
00:06:33,770 --> 00:06:36,179
So this will be about syntax.

97
00:06:36,179 --> 00:06:41,450
So let us have just a few more details
of how syntax can be represented.

98
00:06:41,450 --> 00:06:43,760
Usually these are some kinds of trees.

99
00:06:43,760 --> 00:06:47,780
So here you can see the dependency
tree and it says that, for

100
00:06:47,780 --> 00:06:52,260
example, the word shot is
the main word here and

101
00:06:52,260 --> 00:06:56,260
it has the subject, I,
and the object elephant.

102
00:06:56,260 --> 00:06:59,745
And the elephant has a modifier an,
and so on.

103
00:06:59,745 --> 00:07:03,510
Right, so you have some
dependencies between the words.

104
00:07:03,510 --> 00:07:07,880
And usually you can obtain these
by some syntactic parsers.

105
00:07:07,880 --> 00:07:12,800
Another way to represent syntax would
be so-called constituency trees.

106
00:07:12,800 --> 00:07:16,720
So you can see the same sentence
in the bottom of the slide.

107
00:07:16,720 --> 00:07:21,700
And then you parse it from bottom to
top to get this hierarchical structure.

108
00:07:21,700 --> 00:07:26,640
So you know that an and elephant are
determinant and noun, respectively, and

109
00:07:26,640 --> 00:07:29,220
then you merge them to get a noun phrase.

110
00:07:29,220 --> 00:07:34,540
Also, there to merge it with a verb,
which is short and yet a verb phrase.

111
00:07:34,540 --> 00:07:38,662
You merge it with another subtree and
get a big verb phrase.

112
00:07:38,662 --> 00:07:45,490
And finally, this verb phrase plus noun
phrase, I, gives you the whole sentence.

113
00:07:45,490 --> 00:07:47,879
Actually, you can stop at some moment so

114
00:07:47,879 --> 00:07:52,230
you cannot parse the whole
structure from bottom to the top.

115
00:07:52,230 --> 00:07:55,850
But just say that, it is enough for
you to know that for

116
00:07:55,850 --> 00:08:00,900
example, in my picture is
some particular subtree.

117
00:08:00,900 --> 00:08:02,150
Why can it be useful?

118
00:08:02,150 --> 00:08:05,250
So first, it is called shallow parsing.

119
00:08:05,250 --> 00:08:10,040
And it used for example in named
entities recognition because

120
00:08:10,040 --> 00:08:15,120
a named entity is a very likely to be
a noun phrase, just altogether, right.

121
00:08:15,120 --> 00:08:21,230
New York City would be a nice
noun phrase in some sentence.

122
00:08:21,230 --> 00:08:28,341
So, it can help there but it can also help
as the whole tree in some other tasks.

123
00:08:28,341 --> 00:08:33,210
And the example of some of this
task would be sentiment analysis.

124
00:08:33,210 --> 00:08:37,780
The sentiment analysis treats
reviews as some pieces of text and

125
00:08:37,780 --> 00:08:42,570
tries to predict whether they are positive
or negative, or maybe neutral.

126
00:08:42,570 --> 00:08:45,220
So here you can see that
you have some pluses and

127
00:08:45,220 --> 00:08:49,560
minuses and zeros,
which stand for the sentiment.

128
00:08:49,560 --> 00:08:52,380
So you have your sentence, right?

129
00:08:52,380 --> 00:08:56,120
And then you try to parse
it with your syntax.

130
00:08:56,120 --> 00:09:01,180
So you get this nice subtrees that we
have just seen in the previous slide.

131
00:09:01,180 --> 00:09:06,770
And the idea is, that if you know
the sentiment of some particular words,

132
00:09:06,770 --> 00:09:10,410
for example you know that humor is good.

133
00:09:10,410 --> 00:09:13,690
Then you can try to merge those sentiment

134
00:09:13,690 --> 00:09:16,925
to produce the sentiment
of the whole phrase.

135
00:09:16,925 --> 00:09:22,690
Okay, so intelligent humor are both good
and they give you some good sentiment.

136
00:09:22,690 --> 00:09:29,310
But then when you have some not
in the sentence, you get the not

137
00:09:29,310 --> 00:09:34,360
good sentiment, which results in the
negative sentiment for the whole sentence.

138
00:09:35,550 --> 00:09:37,840
So this is rather advanced approach.

139
00:09:37,840 --> 00:09:40,180
It is called recursive neural networks,

140
00:09:40,180 --> 00:09:45,790
or dynamic acyclic graph neural networks,
and so on.

141
00:09:45,790 --> 00:09:49,290
Sometimes they can be useful,
but in many practical cases,

142
00:09:49,290 --> 00:09:54,840
it is just enough to do some more
simple classification for your work.

143
00:09:54,840 --> 00:10:00,480
So in the rest of this week, my colleague
will discuss classification task.

144
00:10:00,480 --> 00:10:04,162
For example, for
sentiment analysis in many, many details.

145
00:10:04,162 --> 00:10:14,162
[MUSIC]