[MUSIC] In this video, I want to remind you that
NLP area is not only about mathematics but it also about linguistics, and
it is really important to remember it. So the first slide will be about this picture that is really very popular
in many introductions to NLP. But I think that we also
need to briefly cover it. So let us say that we
are given some sentence. There are different stages of analysis for
that sentence. The first stage, which is called morphological stage,
would be about different forms of words. For example, we care about part of speech
text, we care about different cases and genders and tenses. So this is everything that goes just for
single words in the sentence. Then the next stage, syntactical analysis, will be about different relations
between words in the sentence. For example, we can know that there
are some objects and subjects and so on. Now the next stage, once we know some synthetic structures,
would be about semantics. So semantics is about the meaning. So you see, we are going higher and
higher in our abstraction, going from just some
symbols to some meanings. And to be pragmatics would be
the highest level of this abstraction. Now, one reason why we do not
cover all this building blocks in many details later in our course
is that you can just use some very nice log books implementations for
low level stages. For example for morphological and
syntactical analysis, you might try using analytical library which is
a really convenient tool in Python. So please feel free to investigate it. And another thing that I wanted
to mention is Stanford parser. It is a parser for synthetic analysis
that provides different options and has really lots of
different models built in. Now Gensim and MALLET would be
about more high level abstractions. For example, you can do
subclassification problems there or you can think about semantics. So you have there topic models and some word embeddings representations that
we will discuss later in week three. Now, another thing which also
comes from linguistic part of our area is different types of
relations between the words. And linguists know really a lot
about what could be that types. And this knowledge can be found
in some extrinsic resources. For example, WordNet is a resource
that tells you that there are, for example, some hierarchical relationships. Like, we have some fruits, and then some particular types of fruits
like peach, apple, orange, and so on. So this relation would be
called hyponym and hypernym. And there are also some other
relationships like part and the whole. For example, you have a wheel and a car. So this type of relationship
is called meronyms. Now this type of relationships can
be found in the WordNet resource. Here in this slide, I have a picture
of another resource, BabelNet. The BabelNet resource is multilingual, so you can find some concepts in
different languages there. And what is nice, you have some
relations between these concepts. So for example,
I just typed in NOP there and then I have seen part
of speech taking test. I clicked into this test and I could see some nearest neighbors
in this space of concepts. For example,
I can see that the Viterbi algorithm and Baum-Welch algorithm
are somewhere close by. And after Week two of our course, you'll know that they are indeed
very related to this task. So the takeaway from this slide would
be to remember that there are some extrinsic resources that can be
nicely used in our applications. For example, how can they be used? This is a rather complicated task. It is called reasoning, and it says that
there is some story in a natural language. For example, Mary got the football,
she went to the kitchen, she left the ball there. Okay, so we have some story, and now we have a question after this
story, where is the football now? So to answer this question, the machine needs to somehow
understand something, right. And the way that we can build this
system would be based on deep learning. So you might have heard
about LSTM networks, it is a particular type of
recurrent neural networks. But here, you see that you have not
only the sequential transition edges in your data representation, but
you have also some other edges. Those red ages,
tell you about coreference. Coreference is another linguistic
type of relation between the words that says like she is the same as Mary,
right? So she is just a substitute for the Mary. And for example, this and that football
is the same ball, just mentioned twice. The green think is about hypernym
relationship that I have briefly mentioned. So the football is a particular
type of the balls, right. So once we know that our words
have some relationships, we can add some additional
edges to our data structure. And after that, we can have so
called DAG-LSTM, which is dynamic acyclic graph-LSTM that
will try to utilize these edges, okay? So I'm not going now to
cover this DAG-LSTM model. I just want you to see that there is
a way to use the linguistic knowledge to our needs here and to improve
the performance of some particular question answering task, for example. In the rest of the video, I want to cover another example of
linguistic information used in the system. So this will be about syntax. So let us have just a few more details
of how syntax can be represented. Usually these are some kinds of trees. So here you can see the dependency
tree and it says that, for example, the word shot is
the main word here and it has the subject, I,
and the object elephant. And the elephant has a modifier an,
and so on. Right, so you have some
dependencies between the words. And usually you can obtain these
by some syntactic parsers. Another way to represent syntax would
be so-called constituency trees. So you can see the same sentence
in the bottom of the slide. And then you parse it from bottom to
top to get this hierarchical structure. So you know that an and elephant are
determinant and noun, respectively, and then you merge them to get a noun phrase. Also, there to merge it with a verb,
which is short and yet a verb phrase. You merge it with another subtree and
get a big verb phrase. And finally, this verb phrase plus noun
phrase, I, gives you the whole sentence. Actually, you can stop at some moment so you cannot parse the whole
structure from bottom to the top. But just say that, it is enough for
you to know that for example, in my picture is
some particular subtree. Why can it be useful? So first, it is called shallow parsing. And it used for example in named
entities recognition because a named entity is a very likely to be
a noun phrase, just altogether, right. New York City would be a nice
noun phrase in some sentence. So, it can help there but it can also help
as the whole tree in some other tasks. And the example of some of this
task would be sentiment analysis. The sentiment analysis treats
reviews as some pieces of text and tries to predict whether they are positive
or negative, or maybe neutral. So here you can see that
you have some pluses and minuses and zeros,
which stand for the sentiment. So you have your sentence, right? And then you try to parse
it with your syntax. So you get this nice subtrees that we
have just seen in the previous slide. And the idea is, that if you know
the sentiment of some particular words, for example you know that humor is good. Then you can try to merge those sentiment to produce the sentiment
of the whole phrase. Okay, so intelligent humor are both good
and they give you some good sentiment. But then when you have some not
in the sentence, you get the not good sentiment, which results in the
negative sentiment for the whole sentence. So this is rather advanced approach. It is called recursive neural networks, or dynamic acyclic graph neural networks,
and so on. Sometimes they can be useful,
but in many practical cases, it is just enough to do some more
simple classification for your work. So in the rest of this week, my colleague
will discuss classification task. For example, for
sentiment analysis in many, many details. [MUSIC]