1
00:00:00,000 --> 00:00:05,847
We've seen a number of techniques, some in
previous lectures and some we talked about

2
00:00:05,847 --> 00:00:09,606
in this lecture.
We've seen the Naive Bayes classifier,

3
00:00:09,606 --> 00:00:13,574
fairly in detail.
We've seen some Probabilistic Graphical

4
00:00:13,574 --> 00:00:18,377
Models like Bayesian Networks.
We've seen linear regression in detail,

5
00:00:18,377 --> 00:00:23,807
this time, we also heard about logistic
regression neural networks and support

6
00:00:23,807 --> 00:00:26,940
vector machines, at least as to what they
are.

7
00:00:27,300 --> 00:00:32,433
And, let's look at the problem and see
which kind of techniques one would need to

8
00:00:32,433 --> 00:00:35,475
consider depending on the nature of the
problem.

9
00:00:35,475 --> 00:00:40,609
We classify the problem in terms of the
kind of features it has, whether they are

10
00:00:40,609 --> 00:00:44,094
numerical or categorical, that means
numbers or classes.

11
00:00:44,094 --> 00:00:48,087
And the target variable, which is what we
are trying to predict.

12
00:00:48,087 --> 00:00:53,031
We might be predicting a value, then it
becomes a prediction problem where the

13
00:00:53,031 --> 00:00:56,770
value is numerical.
We might be predicting a class in which,

14
00:00:56,770 --> 00:01:01,080
in which case it's a classification
problem which is part of learning theory.

15
00:01:01,980 --> 00:01:07,251
Techniques can be used interchangeably
across these two different types of

16
00:01:07,251 --> 00:01:12,663
prediction based on the kinds of features,
of course, some techniques are more

17
00:01:12,663 --> 00:01:17,043
applicable than others.
So, in the most straightforward case,

18
00:01:17,043 --> 00:01:22,569
If we have numerical features and a
numerical target we want to predict,

19
00:01:22,569 --> 00:01:28,018
The correlation is stable and fairly
linear, we'd use linear regression.

20
00:01:28,018 --> 00:01:33,774
Now, when I say stable and fair, fairly
linear, even in situations like this,

21
00:01:33,774 --> 00:01:39,684
One would still prefer to use linear
regression rather than some complicated

22
00:01:39,914 --> 00:01:44,710
non-linear function.
Because using high order functions we'd,

23
00:01:44,710 --> 00:01:51,078
we'd say squares or cubes and sines and
cos's, will tend to over fit the data and

24
00:01:51,078 --> 00:01:56,992
will not generalize to situations which
may come up, arise in the future.

25
00:01:56,992 --> 00:02:02,906
So, right now you might have a great fit
to the training data, but it really

26
00:02:02,906 --> 00:02:07,910
doesn't work in practice.
So, linear regression is preferred unless

27
00:02:07,910 --> 00:02:12,080
you have some real reason to not use
linear techniques.

28
00:02:12,080 --> 00:02:17,992
Similarly, even if your futures are
categorical and your target is numerical,

29
00:02:17,992 --> 00:02:22,883
you can still use linear regression but
you have to code the features. So, if the

30
00:02:22,883 --> 00:02:27,651
feature, for example, takes five different
values or eight different values, you

31
00:02:27,651 --> 00:02:32,113
replace that feature with eight
categorical variables, binary ones taking

32
00:02:32,113 --> 00:02:37,004
zero and one depending on whether, which
value, which category value that feature

33
00:02:37,004 --> 00:02:39,830
took.
It's better to do with, with binary coding

34
00:02:39,830 --> 00:02:44,940
as opposed to say, numerical coding
because there's no reason why red being

35
00:02:44,940 --> 00:02:49,641
coded as five, blue being coded as six,
and green being coded as seven.

36
00:02:49,641 --> 00:02:54,887
There's no reason to believe that red and
blue are closer than red and green.

37
00:02:54,887 --> 00:02:59,316
So, using five, six, and seven is
misleading and can make the regression

38
00:02:59,316 --> 00:03:03,628
technique go haywire.
So, using three different features, eight,

39
00:03:03,628 --> 00:03:10,366
zero, one, to figure out whether something
is red, blue, or green is better than

40
00:03:10,366 --> 00:03:15,793
using numbers.
When we have categorical variables and

41
00:03:15,793 --> 00:03:20,347
numerical target,
Neural networks can also be used just like

42
00:03:20,347 --> 00:03:24,901
they can be used for normal linear
regression as well.

43
00:03:24,901 --> 00:03:30,613
But, they've sort of waned in their
popularity except for certain situations

44
00:03:30,613 --> 00:03:34,300
which we will talk about in the next
section.

45
00:03:34,960 --> 00:03:39,404
Now, let's come to the case where we have
unstable or severely non-linear

46
00:03:39,404 --> 00:03:43,727
situations, which might look something
like this, as we have seen before.

47
00:03:43,727 --> 00:03:48,902
There is no way one can fit a straight
line to this para, this parabolic curve.

48
00:03:49,084 --> 00:03:54,568
and therefore, it's better to use a neural
network which has non-linear elements,

49
00:03:54,568 --> 00:03:59,427
multi-level and hidden layers.
So, a more complicated function can be

50
00:03:59,427 --> 00:04:04,491
learned, at the same time, one is not
pre-supposing that it's going to be a

51
00:04:04,491 --> 00:04:07,092
problem.
Because that, that would be kind of

52
00:04:07,092 --> 00:04:12,840
counter-productive because one is sort of
pre-supposing the nature of f, rather than

53
00:04:12,840 --> 00:04:17,494
letting a neural network with many
different possibilities discovered..

54
00:04:19,420 --> 00:04:24,961
Next, we come to the classification
situations where the target variable is

55
00:04:24,961 --> 00:04:28,656
categorical.
Of course, when we have categorical

56
00:04:28,656 --> 00:04:34,789
features and categorical targets, when we
have seen how to use naive-based and other

57
00:04:34,789 --> 00:04:40,448
probabilistic graphical models.
These days, SVM's or Support Vector

58
00:04:40,448 --> 00:04:44,987
Machines are also very popular for even
classification.

59
00:04:44,987 --> 00:04:52,001
Of course, for catagorical variables, one
does have to do feature coding to a

60
00:04:52,001 --> 00:04:57,240
certain extent. So, we, we,
We do need to do the same trick that we

61
00:04:57,240 --> 00:05:03,487
did for categorical features in linear
regression because SVM essentially

62
00:05:03,487 --> 00:05:08,295
requires numerical inputs.
Of course, if you have numerical inputs

63
00:05:08,295 --> 00:05:14,153
and categorical classification, SVMs are
perfect. They are designed especially for

64
00:05:14,153 --> 00:05:19,875
those situations where you have unstable
and severely non-linear correlations, and

65
00:05:19,875 --> 00:05:24,840
that's what they essentially do well, very
well. On the other hand, if you have

66
00:05:24,840 --> 00:05:29,520
fairly stable linear correlations and you
do have a classification problem,

67
00:05:29,520 --> 00:05:34,500
Then rather than using linear regression,
as we have seen, one should use a logistic

68
00:05:34,500 --> 00:05:39,600
regression where one is bumping up or
bumping down the, the difference form the

69
00:05:39,780 --> 00:05:42,420
separating line using the logistic
function.

70
00:05:42,700 --> 00:05:48,287
So, take a look at this table. It will
guide you, definitely in the problem set

71
00:05:48,287 --> 00:05:52,120
or the homework assign, the programming
assignment for prediction.

72
00:05:52,120 --> 00:05:57,187
But, in general also, it's something that
you should learn something from.

73
00:05:57,187 --> 00:06:02,320
We have not covered many techniques yet,
we've only taken a very few techniques.

74
00:06:02,320 --> 00:06:06,933
Further, we've only talked about
classification prediction, optimization,

75
00:06:06,933 --> 00:06:12,260
control, we haven't talked about those and
we won't have time to get into those in

76
00:06:12,260 --> 00:06:13,040
this course.