1 00:00:00,000 --> 00:00:06,061 Later on in this course, we will show how natural language processing techniques can 2 00:00:06,061 --> 00:00:11,017 be used to extract deeper information from human language. 3 00:00:11,017 --> 00:00:14,072 Let's now recap what we've learned this week. 4 00:00:15,024 --> 00:00:22,289 We actually started with the information theory from Shannon that was invented for 5 00:00:22,289 --> 00:00:29,829 a totally different context, looked at the concept of mutual information and applied 6 00:00:29,829 --> 00:00:36,337 it to the statistics of language, showed how we could summarize documents and 7 00:00:36,337 --> 00:00:43,370 discern the best possible keywords using TF-IDF which turned out to be exactly the 8 00:00:43,370 --> 00:00:50,034 same thing as mutual information. We figured out the relationship between 9 00:00:50,034 --> 00:00:55,072 communication and machine learning in terms of mutual information learning 10 00:00:55,072 --> 00:01:00,088 again, and learned the very important nieve base classifier, which is the 11 00:01:00,088 --> 00:01:04,076 foundation for almost all machine learning techniques. 12 00:01:06,043 --> 00:01:12,002 Then we summarized with the limits of machine learning, both from the 13 00:01:12,002 --> 00:01:18,041 information theoretic perspective which also told us which features to use and 14 00:01:18,041 --> 00:01:23,011 which not to use. And lastly we ended with some suspicions 15 00:01:23,011 --> 00:01:29,058 about whether the bag of words approach that we'd used just considering without 16 00:01:29,058 --> 00:01:35,081 their grammatical syntax or semantics was actually enough to discern meaning. 17 00:01:37,051 --> 00:01:44,037 In future class we will ask questions such as where features themselves come from. 18 00:01:45,000 --> 00:01:52,024 For the moment, we have chosen features like words and we have labeled passed data 19 00:01:52,024 --> 00:01:57,001 manually or by experience, such as buyers and browsers. 20 00:01:57,001 --> 00:02:04,025 In our lives however, the labels and the features need to be derived automatically 21 00:02:04,025 --> 00:02:11,049 by us as we learn about the world with no supervision or nobody telling us what's 22 00:02:11,049 --> 00:02:18,040 the feature and what's not. Before we come to those very interesting 23 00:02:18,040 --> 00:02:25,451 ideas in the world of learning, we'll first take an excursion into big data 24 00:02:25,451 --> 00:02:32,684 technology next week as we have promised. We'll describe how the new technologies 25 00:02:32,684 --> 00:02:39,025 that were developed in the web world differs significantly from traditional 26 00:02:39,025 --> 00:02:42,511 technologies. And then we'll do some experiments and 27 00:02:42,511 --> 00:02:47,539 assignments of how they can be used for indexing, page rank, computing TF-IDF, 28 00:02:47,539 --> 00:02:53,131 implementing naive Bayes classifiers, computing mutual information, and all the 29 00:02:53,131 --> 00:02:58,895 nice stuff that we have learned so far, including locality sensitive hashing, that 30 00:02:58,895 --> 00:03:03,059 we did last week. We've learned a lot of theory, done some 31 00:03:03,059 --> 00:03:06,133 calculations. And now, get ready for doing some 32 00:03:06,133 --> 00:03:10,056 implementation, And programming. So see you next week. 33 00:03:10,056 --> 00:03:15,000 And of course, don't forget to submit your homework by Monday.