1 00:00:02,840 --> 00:00:05,205 Hey. In this video, 2 00:00:05,205 --> 00:00:09,570 we will briefly discuss what will be covered during the next weeks. 3 00:00:09,570 --> 00:00:14,700 So, during this week we will discuss text classification tasks. 4 00:00:14,700 --> 00:00:19,075 So these are the tasks that are very popular in any applications. 5 00:00:19,075 --> 00:00:23,135 For example, you need to predict sentiment for some reviews. 6 00:00:23,135 --> 00:00:26,355 You need to know whether the review is positive or negative. 7 00:00:26,355 --> 00:00:31,330 Or, you need to filter spam in your emails or something else. 8 00:00:31,330 --> 00:00:36,443 So, what you do is actually you represent your text as a bag of words, 9 00:00:36,443 --> 00:00:40,155 you compute some nice features and you apply some machine 10 00:00:40,155 --> 00:00:45,315 learning algorithm to predict the class of this text. 11 00:00:45,315 --> 00:00:50,085 There are actually lots of practical tips that you need to know to succeed in this task. 12 00:00:50,085 --> 00:00:53,740 So, during this week my colleague will tell you about it. 13 00:00:53,740 --> 00:00:56,575 Now, the next week will be about 14 00:00:56,575 --> 00:01:01,075 representing text not as a bag of words but as a sequence. 15 00:01:01,075 --> 00:01:07,135 So, what can you do when you represent a text as a sequence of words? 16 00:01:07,135 --> 00:01:09,340 One task would be language modeling. 17 00:01:09,340 --> 00:01:12,300 So language models are about predicting 18 00:01:12,300 --> 00:01:17,080 the probabilities of the next words given some previous words. 19 00:01:17,080 --> 00:01:20,395 So, this can be used to do text generation. 20 00:01:20,395 --> 00:01:23,320 And this is useful in many applications. 21 00:01:23,320 --> 00:01:25,870 For example, if you do machine translation, 22 00:01:25,870 --> 00:01:27,910 you are given some sequence of words, 23 00:01:27,910 --> 00:01:31,625 some sentence on English and then you need to translate it, 24 00:01:31,625 --> 00:01:33,126 let's say to Russian, 25 00:01:33,126 --> 00:01:39,260 so you need to generate some Russian text and that is where you'll need language model. 26 00:01:39,260 --> 00:01:43,530 Now, another important task is called sequence tagging. 27 00:01:43,530 --> 00:01:47,340 So this is the task when you have a sequence of words and you 28 00:01:47,340 --> 00:01:51,570 need to predict text for each of the words in this sequence. 29 00:01:51,570 --> 00:01:54,735 For example, it could be part-of-speech texts 30 00:01:54,735 --> 00:01:57,945 so you need to know that some words are nouns, 31 00:01:57,945 --> 00:02:00,441 some words are verbs and so on. 32 00:02:00,441 --> 00:02:07,015 Another task would be to find named entities and this is really useful. 33 00:02:07,015 --> 00:02:11,245 For example, you can find some names of the cities and use them 34 00:02:11,245 --> 00:02:15,998 as features for your previous task for text classification. 35 00:02:15,998 --> 00:02:18,350 Now, another task which is called semantic 36 00:02:18,350 --> 00:02:21,995 slot filling has been just covered in our previous video. 37 00:02:21,995 --> 00:02:24,415 So this is about some slots. 38 00:02:24,415 --> 00:02:28,250 For example, you need to pass a query and you need to know that 39 00:02:28,250 --> 00:02:34,280 the person wants to book a table for some specific time in some specific place. 40 00:02:34,280 --> 00:02:37,460 And those time and place would be the slots for you. 41 00:02:37,460 --> 00:02:41,915 Now, we can do something even more complicated 42 00:02:41,915 --> 00:02:47,323 and try to understand the meaning of words or some pieces of text. 43 00:02:47,323 --> 00:02:49,685 How do we represent the meaning? 44 00:02:49,685 --> 00:02:54,110 Well, one easy way to do this would be to use vectors. 45 00:02:54,110 --> 00:02:58,780 So, you map all the words to some vectors. 46 00:02:58,780 --> 00:03:02,240 Let's say 300 dimensional vectors of 47 00:03:02,240 --> 00:03:07,625 some float numbers and these vectors will have really nice properties. 48 00:03:07,625 --> 00:03:11,645 So, similar words will have similar vectors. 49 00:03:11,645 --> 00:03:17,525 For example, this nice picture tells you that Cappuccino and Espresso are the same thing 50 00:03:17,525 --> 00:03:23,946 just because the cosine similarity between the vectors is really high. 51 00:03:23,946 --> 00:03:27,240 Now, we will also discuss topic models. 52 00:03:27,240 --> 00:03:32,020 Topic models deal with documents as a whole and they also represent them by 53 00:03:32,020 --> 00:03:37,120 some vectors that can tell you what are the topics in these certain documents. 54 00:03:37,120 --> 00:03:40,465 This is really useful when you need to, for example, 55 00:03:40,465 --> 00:03:45,130 describe the topics of a big data set like Wikipedia or 56 00:03:45,130 --> 00:03:51,985 some news flows or social networks or any other text data that you are interested in. 57 00:03:51,985 --> 00:03:57,340 Now, this is just another example of how those methods can be used. 58 00:03:57,340 --> 00:04:01,465 So, let's say that we could represent our words by vectors 59 00:04:01,465 --> 00:04:07,030 just by three dimensional vectors and we have them depicted here in this space. 60 00:04:07,030 --> 00:04:09,920 And we know the similarity between them. 61 00:04:09,920 --> 00:04:13,155 So, we know the distance between those blue dots. 62 00:04:13,155 --> 00:04:15,255 Once we know these distances, 63 00:04:15,255 --> 00:04:18,650 we can create a similarity graph for words. 64 00:04:18,650 --> 00:04:20,365 So, in the middle picture, 65 00:04:20,365 --> 00:04:22,175 the nodes are words, 66 00:04:22,175 --> 00:04:27,210 and the edges have the similarities between the nodes. 67 00:04:27,210 --> 00:04:30,410 Now, this graph is actually very useful. 68 00:04:30,410 --> 00:04:35,595 Why? Because when you have some labels for some nodes of this graph, for example, 69 00:04:35,595 --> 00:04:38,760 if you know that "Laugh" has the label 70 00:04:38,760 --> 00:04:43,665 "Funny," you can try to propagate these labels through the graph. 71 00:04:43,665 --> 00:04:48,315 So those words that are similar will get the same labels. 72 00:04:48,315 --> 00:04:52,110 For example, the word "Haha" there will also get the label 73 00:04:52,110 --> 00:04:56,418 "Funny" because it is similar to the word "Laugh." 74 00:04:56,418 --> 00:05:02,850 Okay. This can be used in many different applications and we'll cover this in week three. 75 00:05:02,850 --> 00:05:06,240 Now, the next one will be more advanced 76 00:05:06,240 --> 00:05:10,225 and this week will be about sequence to sequence tasks. 77 00:05:10,225 --> 00:05:17,885 Actually, nearly any task in NLP can be somehow stated as a sequence to sequence tasks. 78 00:05:17,885 --> 00:05:19,800 Just to give you a few examples, 79 00:05:19,800 --> 00:05:22,110 it would be about machine translation. 80 00:05:22,110 --> 00:05:23,880 So there, obviously, you have 81 00:05:23,880 --> 00:05:27,960 one sentence and you need to translate it to the other sentence. 82 00:05:27,960 --> 00:05:30,240 So, these are the two sequences. 83 00:05:30,240 --> 00:05:32,565 But, for example, in summarization, 84 00:05:32,565 --> 00:05:35,400 you have the big document as an input. 85 00:05:35,400 --> 00:05:39,863 This is some long sequence and you need to produce some short summary, 86 00:05:39,863 --> 00:05:41,705 and this is also a sequence. 87 00:05:41,705 --> 00:05:44,610 You get this task in speech recognition or in 88 00:05:44,610 --> 00:05:48,925 conversational chat-bot where you have some questions and answers. 89 00:05:48,925 --> 00:05:52,275 Right? All these tasks can be nicely solved with 90 00:05:52,275 --> 00:05:56,295 so-called encoder-decoder architecture in neural networks. 91 00:05:56,295 --> 00:05:58,510 Let us see just the idea. 92 00:05:58,510 --> 00:06:02,445 So, we're given a sentence and we have an encoder. 93 00:06:02,445 --> 00:06:05,720 Now we feed this sentence to the encoder. 94 00:06:05,720 --> 00:06:11,015 What we get is some hidden representation of the input sentence. 95 00:06:11,015 --> 00:06:15,685 After that, the decoder generates the output sentence. 96 00:06:15,685 --> 00:06:19,300 So, this is how we get our final translation, 97 00:06:19,300 --> 00:06:21,200 or summary, or something else. 98 00:06:21,200 --> 00:06:25,355 Now, during the last week of our course, 99 00:06:25,355 --> 00:06:30,375 we will combine all the knowledge that we have to build a dialogue system. 100 00:06:30,375 --> 00:06:35,895 So, dialogue systems can be different and there are at least two important types. 101 00:06:35,895 --> 00:06:41,910 One type would be goal-oriented agents that try to solve some particular task. 102 00:06:41,910 --> 00:06:45,050 For example, they can assist you in a bank, 103 00:06:45,050 --> 00:06:49,285 or help you with online shopping, or something like that. 104 00:06:49,285 --> 00:06:52,745 On the contrary, there are also conversational, entertaining, 105 00:06:52,745 --> 00:06:57,890 chat-bots that just wants to somehow hold a conversation with you. 106 00:06:57,890 --> 00:07:01,310 So, there are different types of methods to be used in 107 00:07:01,310 --> 00:07:05,900 these types of tasks and we will cover them in details during the last week. 108 00:07:05,900 --> 00:07:08,360 And the project will be about 109 00:07:08,360 --> 00:07:13,040 stack overflow chat-bot that tries to assist with the search. 110 00:07:13,040 --> 00:07:18,180 So, stay with us and we will discuss everything in many details.