1 00:00:00,000 --> 00:00:03,819 [MUSIC] 2 00:00:03,819 --> 00:00:09,168 In this video, I want to remind you that NLP area is not only about mathematics but 3 00:00:09,168 --> 00:00:14,620 it also about linguistics, and it is really important to remember it. 4 00:00:14,620 --> 00:00:17,320 So the first slide will be about this 5 00:00:18,380 --> 00:00:22,590 picture that is really very popular in many introductions to NLP. 6 00:00:22,590 --> 00:00:25,830 But I think that we also need to briefly cover it. 7 00:00:25,830 --> 00:00:29,000 So let us say that we are given some sentence. 8 00:00:29,000 --> 00:00:32,560 There are different stages of analysis for that sentence. 9 00:00:32,560 --> 00:00:33,690 The first stage, 10 00:00:33,690 --> 00:00:39,410 which is called morphological stage, would be about different forms of words. 11 00:00:39,410 --> 00:00:44,220 For example, we care about part of speech text, we care about different cases and 12 00:00:44,220 --> 00:00:46,330 genders and tenses. 13 00:00:46,330 --> 00:00:52,160 So this is everything that goes just for single words in the sentence. 14 00:00:52,160 --> 00:00:55,350 Then the next stage, syntactical analysis, 15 00:00:55,350 --> 00:00:59,950 will be about different relations between words in the sentence. 16 00:00:59,950 --> 00:01:05,180 For example, we can know that there are some objects and subjects and so on. 17 00:01:06,280 --> 00:01:07,960 Now the next stage, 18 00:01:07,960 --> 00:01:12,680 once we know some synthetic structures, would be about semantics. 19 00:01:12,680 --> 00:01:15,650 So semantics is about the meaning. 20 00:01:15,650 --> 00:01:19,566 So you see, we are going higher and higher in our abstraction, 21 00:01:19,566 --> 00:01:23,708 going from just some symbols to some meanings. 22 00:01:23,708 --> 00:01:27,970 And to be pragmatics would be the highest level of this abstraction. 23 00:01:29,130 --> 00:01:33,899 Now, one reason why we do not cover all this building blocks in 24 00:01:33,899 --> 00:01:38,485 many details later in our course is that you can just use some 25 00:01:38,485 --> 00:01:43,480 very nice log books implementations for low level stages. 26 00:01:43,480 --> 00:01:48,120 For example for morphological and syntactical analysis, you might try 27 00:01:48,120 --> 00:01:53,840 using analytical library which is a really convenient tool in Python. 28 00:01:53,840 --> 00:01:57,210 So please feel free to investigate it. 29 00:01:57,210 --> 00:02:01,010 And another thing that I wanted to mention is Stanford parser. 30 00:02:01,010 --> 00:02:06,090 It is a parser for synthetic analysis that provides different options and 31 00:02:06,090 --> 00:02:08,790 has really lots of different models built in. 32 00:02:09,800 --> 00:02:15,760 Now Gensim and MALLET would be about more high level abstractions. 33 00:02:15,760 --> 00:02:19,430 For example, you can do subclassification problems there or 34 00:02:19,430 --> 00:02:21,630 you can think about semantics. 35 00:02:21,630 --> 00:02:23,660 So you have there topic models and 36 00:02:23,660 --> 00:02:28,280 some word embeddings representations that we will discuss later in week three. 37 00:02:29,325 --> 00:02:34,239 Now, another thing which also comes from linguistic part of our 38 00:02:34,239 --> 00:02:39,320 area is different types of relations between the words. 39 00:02:39,320 --> 00:02:43,770 And linguists know really a lot about what could be that types. 40 00:02:43,770 --> 00:02:47,880 And this knowledge can be found in some extrinsic resources. 41 00:02:47,880 --> 00:02:52,210 For example, WordNet is a resource that tells you that there are, for 42 00:02:52,210 --> 00:02:55,270 example, some hierarchical relationships. 43 00:02:55,270 --> 00:02:57,360 Like, we have some fruits, and 44 00:02:57,360 --> 00:03:02,490 then some particular types of fruits like peach, apple, orange, and so on. 45 00:03:02,490 --> 00:03:06,820 So this relation would be called hyponym and hypernym. 46 00:03:06,820 --> 00:03:12,160 And there are also some other relationships like part and the whole. 47 00:03:12,160 --> 00:03:15,030 For example, you have a wheel and a car. 48 00:03:15,030 --> 00:03:19,130 So this type of relationship is called meronyms. 49 00:03:19,130 --> 00:03:24,840 Now this type of relationships can be found in the WordNet resource. 50 00:03:24,840 --> 00:03:29,570 Here in this slide, I have a picture of another resource, BabelNet. 51 00:03:29,570 --> 00:03:32,560 The BabelNet resource is multilingual, so 52 00:03:32,560 --> 00:03:36,290 you can find some concepts in different languages there. 53 00:03:36,290 --> 00:03:41,140 And what is nice, you have some relations between these concepts. 54 00:03:41,140 --> 00:03:44,767 So for example, I just typed in NOP there and 55 00:03:44,767 --> 00:03:48,405 then I have seen part of speech taking test. 56 00:03:48,405 --> 00:03:50,188 I clicked into this test and 57 00:03:50,188 --> 00:03:54,910 I could see some nearest neighbors in this space of concepts. 58 00:03:54,910 --> 00:03:58,065 For example, I can see that the Viterbi algorithm and 59 00:03:58,065 --> 00:04:01,880 Baum-Welch algorithm are somewhere close by. 60 00:04:01,880 --> 00:04:04,590 And after Week two of our course, 61 00:04:04,590 --> 00:04:08,550 you'll know that they are indeed very related to this task. 62 00:04:08,550 --> 00:04:13,401 So the takeaway from this slide would be to remember that there are some 63 00:04:13,401 --> 00:04:18,020 extrinsic resources that can be nicely used in our applications. 64 00:04:19,172 --> 00:04:22,620 For example, how can they be used? 65 00:04:22,620 --> 00:04:24,500 This is a rather complicated task. 66 00:04:24,500 --> 00:04:30,160 It is called reasoning, and it says that there is some story in a natural language. 67 00:04:30,160 --> 00:04:34,930 For example, Mary got the football, she went to the kitchen, 68 00:04:34,930 --> 00:04:36,294 she left the ball there. 69 00:04:36,294 --> 00:04:38,420 Okay, so we have some story, 70 00:04:38,420 --> 00:04:42,950 and now we have a question after this story, where is the football now? 71 00:04:44,150 --> 00:04:45,990 So to answer this question, 72 00:04:45,990 --> 00:04:50,890 the machine needs to somehow understand something, right. 73 00:04:50,890 --> 00:04:55,880 And the way that we can build this system would be based on deep learning. 74 00:04:55,880 --> 00:04:59,970 So you might have heard about LSTM networks, 75 00:04:59,970 --> 00:05:03,330 it is a particular type of recurrent neural networks. 76 00:05:03,330 --> 00:05:08,725 But here, you see that you have not only the sequential transition edges 77 00:05:08,725 --> 00:05:13,696 in your data representation, but you have also some other edges. 78 00:05:13,696 --> 00:05:19,355 Those red ages, tell you about coreference. 79 00:05:19,355 --> 00:05:24,500 Coreference is another linguistic type of relation between the words 80 00:05:24,500 --> 00:05:28,760 that says like she is the same as Mary, right? 81 00:05:28,760 --> 00:05:32,520 So she is just a substitute for the Mary. 82 00:05:32,520 --> 00:05:37,370 And for example, this and that football is the same ball, just mentioned twice. 83 00:05:38,570 --> 00:05:43,252 The green think is about hypernym relationship that I have briefly 84 00:05:43,252 --> 00:05:44,640 mentioned. 85 00:05:44,640 --> 00:05:49,078 So the football is a particular type of the balls, right. 86 00:05:50,250 --> 00:05:54,460 So once we know that our words have some relationships, 87 00:05:54,460 --> 00:05:58,810 we can add some additional edges to our data structure. 88 00:05:58,810 --> 00:06:03,262 And after that, we can have so called DAG-LSTM, 89 00:06:03,262 --> 00:06:11,080 which is dynamic acyclic graph-LSTM that will try to utilize these edges, okay? 90 00:06:11,080 --> 00:06:15,050 So I'm not going now to cover this DAG-LSTM model. 91 00:06:15,050 --> 00:06:19,760 I just want you to see that there is a way to use the linguistic knowledge 92 00:06:19,760 --> 00:06:23,990 to our needs here and to improve the performance of some particular 93 00:06:23,990 --> 00:06:26,860 question answering task, for example. 94 00:06:26,860 --> 00:06:27,840 In the rest of the video, 95 00:06:27,840 --> 00:06:33,770 I want to cover another example of linguistic information used in the system. 96 00:06:33,770 --> 00:06:36,179 So this will be about syntax. 97 00:06:36,179 --> 00:06:41,450 So let us have just a few more details of how syntax can be represented. 98 00:06:41,450 --> 00:06:43,760 Usually these are some kinds of trees. 99 00:06:43,760 --> 00:06:47,780 So here you can see the dependency tree and it says that, for 100 00:06:47,780 --> 00:06:52,260 example, the word shot is the main word here and 101 00:06:52,260 --> 00:06:56,260 it has the subject, I, and the object elephant. 102 00:06:56,260 --> 00:06:59,745 And the elephant has a modifier an, and so on. 103 00:06:59,745 --> 00:07:03,510 Right, so you have some dependencies between the words. 104 00:07:03,510 --> 00:07:07,880 And usually you can obtain these by some syntactic parsers. 105 00:07:07,880 --> 00:07:12,800 Another way to represent syntax would be so-called constituency trees. 106 00:07:12,800 --> 00:07:16,720 So you can see the same sentence in the bottom of the slide. 107 00:07:16,720 --> 00:07:21,700 And then you parse it from bottom to top to get this hierarchical structure. 108 00:07:21,700 --> 00:07:26,640 So you know that an and elephant are determinant and noun, respectively, and 109 00:07:26,640 --> 00:07:29,220 then you merge them to get a noun phrase. 110 00:07:29,220 --> 00:07:34,540 Also, there to merge it with a verb, which is short and yet a verb phrase. 111 00:07:34,540 --> 00:07:38,662 You merge it with another subtree and get a big verb phrase. 112 00:07:38,662 --> 00:07:45,490 And finally, this verb phrase plus noun phrase, I, gives you the whole sentence. 113 00:07:45,490 --> 00:07:47,879 Actually, you can stop at some moment so 114 00:07:47,879 --> 00:07:52,230 you cannot parse the whole structure from bottom to the top. 115 00:07:52,230 --> 00:07:55,850 But just say that, it is enough for you to know that for 116 00:07:55,850 --> 00:08:00,900 example, in my picture is some particular subtree. 117 00:08:00,900 --> 00:08:02,150 Why can it be useful? 118 00:08:02,150 --> 00:08:05,250 So first, it is called shallow parsing. 119 00:08:05,250 --> 00:08:10,040 And it used for example in named entities recognition because 120 00:08:10,040 --> 00:08:15,120 a named entity is a very likely to be a noun phrase, just altogether, right. 121 00:08:15,120 --> 00:08:21,230 New York City would be a nice noun phrase in some sentence. 122 00:08:21,230 --> 00:08:28,341 So, it can help there but it can also help as the whole tree in some other tasks. 123 00:08:28,341 --> 00:08:33,210 And the example of some of this task would be sentiment analysis. 124 00:08:33,210 --> 00:08:37,780 The sentiment analysis treats reviews as some pieces of text and 125 00:08:37,780 --> 00:08:42,570 tries to predict whether they are positive or negative, or maybe neutral. 126 00:08:42,570 --> 00:08:45,220 So here you can see that you have some pluses and 127 00:08:45,220 --> 00:08:49,560 minuses and zeros, which stand for the sentiment. 128 00:08:49,560 --> 00:08:52,380 So you have your sentence, right? 129 00:08:52,380 --> 00:08:56,120 And then you try to parse it with your syntax. 130 00:08:56,120 --> 00:09:01,180 So you get this nice subtrees that we have just seen in the previous slide. 131 00:09:01,180 --> 00:09:06,770 And the idea is, that if you know the sentiment of some particular words, 132 00:09:06,770 --> 00:09:10,410 for example you know that humor is good. 133 00:09:10,410 --> 00:09:13,690 Then you can try to merge those sentiment 134 00:09:13,690 --> 00:09:16,925 to produce the sentiment of the whole phrase. 135 00:09:16,925 --> 00:09:22,690 Okay, so intelligent humor are both good and they give you some good sentiment. 136 00:09:22,690 --> 00:09:29,310 But then when you have some not in the sentence, you get the not 137 00:09:29,310 --> 00:09:34,360 good sentiment, which results in the negative sentiment for the whole sentence. 138 00:09:35,550 --> 00:09:37,840 So this is rather advanced approach. 139 00:09:37,840 --> 00:09:40,180 It is called recursive neural networks, 140 00:09:40,180 --> 00:09:45,790 or dynamic acyclic graph neural networks, and so on. 141 00:09:45,790 --> 00:09:49,290 Sometimes they can be useful, but in many practical cases, 142 00:09:49,290 --> 00:09:54,840 it is just enough to do some more simple classification for your work. 143 00:09:54,840 --> 00:10:00,480 So in the rest of this week, my colleague will discuss classification task. 144 00:10:00,480 --> 00:10:04,162 For example, for sentiment analysis in many, many details. 145 00:10:04,162 --> 00:10:14,162 [MUSIC]