1 00:00:00,000 --> 00:00:05,847 We've seen a number of techniques, some in previous lectures and some we talked about 2 00:00:05,847 --> 00:00:09,606 in this lecture. We've seen the Naive Bayes classifier, 3 00:00:09,606 --> 00:00:13,574 fairly in detail. We've seen some Probabilistic Graphical 4 00:00:13,574 --> 00:00:18,377 Models like Bayesian Networks. We've seen linear regression in detail, 5 00:00:18,377 --> 00:00:23,807 this time, we also heard about logistic regression neural networks and support 6 00:00:23,807 --> 00:00:26,940 vector machines, at least as to what they are. 7 00:00:27,300 --> 00:00:32,433 And, let's look at the problem and see which kind of techniques one would need to 8 00:00:32,433 --> 00:00:35,475 consider depending on the nature of the problem. 9 00:00:35,475 --> 00:00:40,609 We classify the problem in terms of the kind of features it has, whether they are 10 00:00:40,609 --> 00:00:44,094 numerical or categorical, that means numbers or classes. 11 00:00:44,094 --> 00:00:48,087 And the target variable, which is what we are trying to predict. 12 00:00:48,087 --> 00:00:53,031 We might be predicting a value, then it becomes a prediction problem where the 13 00:00:53,031 --> 00:00:56,770 value is numerical. We might be predicting a class in which, 14 00:00:56,770 --> 00:01:01,080 in which case it's a classification problem which is part of learning theory. 15 00:01:01,980 --> 00:01:07,251 Techniques can be used interchangeably across these two different types of 16 00:01:07,251 --> 00:01:12,663 prediction based on the kinds of features, of course, some techniques are more 17 00:01:12,663 --> 00:01:17,043 applicable than others. So, in the most straightforward case, 18 00:01:17,043 --> 00:01:22,569 If we have numerical features and a numerical target we want to predict, 19 00:01:22,569 --> 00:01:28,018 The correlation is stable and fairly linear, we'd use linear regression. 20 00:01:28,018 --> 00:01:33,774 Now, when I say stable and fair, fairly linear, even in situations like this, 21 00:01:33,774 --> 00:01:39,684 One would still prefer to use linear regression rather than some complicated 22 00:01:39,914 --> 00:01:44,710 non-linear function. Because using high order functions we'd, 23 00:01:44,710 --> 00:01:51,078 we'd say squares or cubes and sines and cos's, will tend to over fit the data and 24 00:01:51,078 --> 00:01:56,992 will not generalize to situations which may come up, arise in the future. 25 00:01:56,992 --> 00:02:02,906 So, right now you might have a great fit to the training data, but it really 26 00:02:02,906 --> 00:02:07,910 doesn't work in practice. So, linear regression is preferred unless 27 00:02:07,910 --> 00:02:12,080 you have some real reason to not use linear techniques. 28 00:02:12,080 --> 00:02:17,992 Similarly, even if your futures are categorical and your target is numerical, 29 00:02:17,992 --> 00:02:22,883 you can still use linear regression but you have to code the features. So, if the 30 00:02:22,883 --> 00:02:27,651 feature, for example, takes five different values or eight different values, you 31 00:02:27,651 --> 00:02:32,113 replace that feature with eight categorical variables, binary ones taking 32 00:02:32,113 --> 00:02:37,004 zero and one depending on whether, which value, which category value that feature 33 00:02:37,004 --> 00:02:39,830 took. It's better to do with, with binary coding 34 00:02:39,830 --> 00:02:44,940 as opposed to say, numerical coding because there's no reason why red being 35 00:02:44,940 --> 00:02:49,641 coded as five, blue being coded as six, and green being coded as seven. 36 00:02:49,641 --> 00:02:54,887 There's no reason to believe that red and blue are closer than red and green. 37 00:02:54,887 --> 00:02:59,316 So, using five, six, and seven is misleading and can make the regression 38 00:02:59,316 --> 00:03:03,628 technique go haywire. So, using three different features, eight, 39 00:03:03,628 --> 00:03:10,366 zero, one, to figure out whether something is red, blue, or green is better than 40 00:03:10,366 --> 00:03:15,793 using numbers. When we have categorical variables and 41 00:03:15,793 --> 00:03:20,347 numerical target, Neural networks can also be used just like 42 00:03:20,347 --> 00:03:24,901 they can be used for normal linear regression as well. 43 00:03:24,901 --> 00:03:30,613 But, they've sort of waned in their popularity except for certain situations 44 00:03:30,613 --> 00:03:34,300 which we will talk about in the next section. 45 00:03:34,960 --> 00:03:39,404 Now, let's come to the case where we have unstable or severely non-linear 46 00:03:39,404 --> 00:03:43,727 situations, which might look something like this, as we have seen before. 47 00:03:43,727 --> 00:03:48,902 There is no way one can fit a straight line to this para, this parabolic curve. 48 00:03:49,084 --> 00:03:54,568 and therefore, it's better to use a neural network which has non-linear elements, 49 00:03:54,568 --> 00:03:59,427 multi-level and hidden layers. So, a more complicated function can be 50 00:03:59,427 --> 00:04:04,491 learned, at the same time, one is not pre-supposing that it's going to be a 51 00:04:04,491 --> 00:04:07,092 problem. Because that, that would be kind of 52 00:04:07,092 --> 00:04:12,840 counter-productive because one is sort of pre-supposing the nature of f, rather than 53 00:04:12,840 --> 00:04:17,494 letting a neural network with many different possibilities discovered.. 54 00:04:19,420 --> 00:04:24,961 Next, we come to the classification situations where the target variable is 55 00:04:24,961 --> 00:04:28,656 categorical. Of course, when we have categorical 56 00:04:28,656 --> 00:04:34,789 features and categorical targets, when we have seen how to use naive-based and other 57 00:04:34,789 --> 00:04:40,448 probabilistic graphical models. These days, SVM's or Support Vector 58 00:04:40,448 --> 00:04:44,987 Machines are also very popular for even classification. 59 00:04:44,987 --> 00:04:52,001 Of course, for catagorical variables, one does have to do feature coding to a 60 00:04:52,001 --> 00:04:57,240 certain extent. So, we, we, We do need to do the same trick that we 61 00:04:57,240 --> 00:05:03,487 did for categorical features in linear regression because SVM essentially 62 00:05:03,487 --> 00:05:08,295 requires numerical inputs. Of course, if you have numerical inputs 63 00:05:08,295 --> 00:05:14,153 and categorical classification, SVMs are perfect. They are designed especially for 64 00:05:14,153 --> 00:05:19,875 those situations where you have unstable and severely non-linear correlations, and 65 00:05:19,875 --> 00:05:24,840 that's what they essentially do well, very well. On the other hand, if you have 66 00:05:24,840 --> 00:05:29,520 fairly stable linear correlations and you do have a classification problem, 67 00:05:29,520 --> 00:05:34,500 Then rather than using linear regression, as we have seen, one should use a logistic 68 00:05:34,500 --> 00:05:39,600 regression where one is bumping up or bumping down the, the difference form the 69 00:05:39,780 --> 00:05:42,420 separating line using the logistic function. 70 00:05:42,700 --> 00:05:48,287 So, take a look at this table. It will guide you, definitely in the problem set 71 00:05:48,287 --> 00:05:52,120 or the homework assign, the programming assignment for prediction. 72 00:05:52,120 --> 00:05:57,187 But, in general also, it's something that you should learn something from. 73 00:05:57,187 --> 00:06:02,320 We have not covered many techniques yet, we've only taken a very few techniques. 74 00:06:02,320 --> 00:06:06,933 Further, we've only talked about classification prediction, optimization, 75 00:06:06,933 --> 00:06:12,260 control, we haven't talked about those and we won't have time to get into those in 76 00:06:12,260 --> 00:06:13,040 this course.