1
00:00:00,000 --> 00:00:06,061
Later on in this course, we will show how
natural language processing techniques can

2
00:00:06,061 --> 00:00:11,017
be used to extract deeper information from
human language.

3
00:00:11,017 --> 00:00:14,072
Let's now recap what we've learned this
week.

4
00:00:15,024 --> 00:00:22,289
We actually started with the information
theory from Shannon that was invented for

5
00:00:22,289 --> 00:00:29,829
a totally different context, looked at the
concept of mutual information and applied

6
00:00:29,829 --> 00:00:36,337
it to the statistics of language, showed
how we could summarize documents and

7
00:00:36,337 --> 00:00:43,370
discern the best possible keywords using
TF-IDF which turned out to be exactly the

8
00:00:43,370 --> 00:00:50,034
same thing as mutual information.
We figured out the relationship between

9
00:00:50,034 --> 00:00:55,072
communication and machine learning in
terms of mutual information learning

10
00:00:55,072 --> 00:01:00,088
again, and learned the very important
nieve base classifier, which is the

11
00:01:00,088 --> 00:01:04,076
foundation for almost all machine learning
techniques.

12
00:01:06,043 --> 00:01:12,002
Then we summarized with the limits of
machine learning, both from the

13
00:01:12,002 --> 00:01:18,041
information theoretic perspective which
also told us which features to use and

14
00:01:18,041 --> 00:01:23,011
which not to use.
And lastly we ended with some suspicions

15
00:01:23,011 --> 00:01:29,058
about whether the bag of words approach
that we'd used just considering without

16
00:01:29,058 --> 00:01:35,081
their grammatical syntax or semantics was
actually enough to discern meaning.

17
00:01:37,051 --> 00:01:44,037
In future class we will ask questions such
as where features themselves come from.

18
00:01:45,000 --> 00:01:52,024
For the moment, we have chosen features
like words and we have labeled passed data

19
00:01:52,024 --> 00:01:57,001
manually or by experience, such as buyers
and browsers.

20
00:01:57,001 --> 00:02:04,025
In our lives however, the labels and the
features need to be derived automatically

21
00:02:04,025 --> 00:02:11,049
by us as we learn about the world with no
supervision or nobody telling us what's

22
00:02:11,049 --> 00:02:18,040
the feature and what's not.
Before we come to those very interesting

23
00:02:18,040 --> 00:02:25,451
ideas in the world of learning, we'll
first take an excursion into big data

24
00:02:25,451 --> 00:02:32,684
technology next week as we have promised.
We'll describe how the new technologies

25
00:02:32,684 --> 00:02:39,025
that were developed in the web world
differs significantly from traditional

26
00:02:39,025 --> 00:02:42,511
technologies.
And then we'll do some experiments and

27
00:02:42,511 --> 00:02:47,539
assignments of how they can be used for
indexing, page rank, computing TF-IDF,

28
00:02:47,539 --> 00:02:53,131
implementing naive Bayes classifiers,
computing mutual information, and all the

29
00:02:53,131 --> 00:02:58,895
nice stuff that we have learned so far,
including locality sensitive hashing, that

30
00:02:58,895 --> 00:03:03,059
we did last week.
We've learned a lot of theory, done some

31
00:03:03,059 --> 00:03:06,133
calculations.
And now, get ready for doing some

32
00:03:06,133 --> 00:03:10,056
implementation, And programming.
So see you next week.

33
00:03:10,056 --> 00:03:15,000
And of course, don't forget to submit your
homework by Monday.