1 00:00:04,920 --> 00:00:07,270 Let's take a couple minutes now to dig in and 2 00:00:07,270 --> 00:00:10,630 see what's going to happen in each module of this course. 3 00:00:10,630 --> 00:00:11,880 So this was the overview. 4 00:00:11,880 --> 00:00:15,740 We're going to have 9 modules and some models are going to appear in 5 00:00:15,740 --> 00:00:19,870 multiple modules, some modules are going to appear in one module. 6 00:00:19,870 --> 00:00:22,880 Some concepts are going to cover multiple modules. 7 00:00:22,880 --> 00:00:27,400 But in general, we're going to see a pre-cohesive presentation where we start 8 00:00:27,400 --> 00:00:30,930 with everything about linear classifiers and logistic regression and 9 00:00:30,930 --> 00:00:32,750 everything about decision trees. 10 00:00:32,750 --> 00:00:36,120 And then we'll cover boosting and some advanced topics. 11 00:00:37,710 --> 00:00:40,400 So in particular, we're going to start linear classifiers. 12 00:00:40,400 --> 00:00:42,850 And we discussed those in the first course. 13 00:00:42,850 --> 00:00:48,050 So for example in a sentiment analysis case we have two words that I care about, 14 00:00:48,050 --> 00:00:50,590 the number of times the word awful appears in that review and 15 00:00:50,590 --> 00:00:53,180 the number of times the word awesome appears in the review. 16 00:00:53,180 --> 00:00:57,070 And the linear classifier might say that every awesome is worth one, 17 00:00:57,070 --> 00:00:58,780 every awful is worth minus 1.5. 18 00:00:58,780 --> 00:01:03,830 And when you have a particular review if it has for 19 00:01:03,830 --> 00:01:07,650 example like the one on the bottom has three awesome and zero awful when that 20 00:01:07,650 --> 00:01:12,350 classified that this positive because this score is greater than zero and 21 00:01:12,350 --> 00:01:14,440 that's true for everything below that line. 22 00:01:14,440 --> 00:01:18,030 And everything above the line had a score of less than zero, and 23 00:01:18,030 --> 00:01:20,440 we're going to classify those as negative points. 24 00:01:20,440 --> 00:01:23,400 We discussed this concept of linear classifiers in the first course. 25 00:01:23,400 --> 00:01:26,360 We'll go in depth and really understand those in the first module. 26 00:01:26,360 --> 00:01:30,060 But then we'll extend it with something called logistic regression, 27 00:01:30,060 --> 00:01:33,500 which is going to allow us to not only predict plus one or 28 00:01:33,500 --> 00:01:38,230 minus one but assign a probability to every data point. 29 00:01:38,230 --> 00:01:42,420 So for example, the points with bright green are very likely to be positive. 30 00:01:42,420 --> 00:01:47,160 The point is bright feature very like to be negative. 31 00:01:47,160 --> 00:01:49,928 But the white pan in between are areas where less certain. 32 00:01:49,928 --> 00:01:51,630 So it's 50/50 probability. 33 00:01:51,630 --> 00:01:53,730 We are going to be able to predict those probabilities and 34 00:01:53,730 --> 00:01:58,540 that's going to really change how our classifiers will get used. 35 00:01:58,540 --> 00:02:01,430 Now the way to define a linear classifier in logistic regression. 36 00:02:01,430 --> 00:02:05,510 In the second module we're going to figure out how to learn the parameters, 37 00:02:05,510 --> 00:02:08,690 the coefficients of the classifiers from data. 38 00:02:08,690 --> 00:02:12,530 So we find the notion of likelihood which measures how well a line 39 00:02:13,590 --> 00:02:18,650 is classified as data and for different values of the coefficient 40 00:02:18,650 --> 00:02:21,650 we're going to have different lines or different classifiers and 41 00:02:21,650 --> 00:02:27,230 we're going to use gradient descent to find the best possible classifier. 42 00:02:27,230 --> 00:02:32,290 Similar way to what we did in linear regression in the previous course. 43 00:02:33,970 --> 00:02:37,340 Overfitting can be a really significant thing in classification. 44 00:02:37,340 --> 00:02:40,140 So here I'm plotting the classification error in the Y-axis, 45 00:02:40,140 --> 00:02:42,660 as we make models more and more complex. 46 00:02:42,660 --> 00:02:43,840 And as you know, 47 00:02:43,840 --> 00:02:48,090 the training error will go to zero as you make the model more complex. 48 00:02:48,090 --> 00:02:53,620 But the true error will go down and eventually go up as we overfit. 49 00:02:53,620 --> 00:02:56,560 So, in classification, the way we are going to experience that is our decision 50 00:02:56,560 --> 00:03:00,215 boundaries will start very simple like a line separating positives and negatives, 51 00:03:00,215 --> 00:03:04,370 and then maybe it will become a curved parabola that fits the data pretty well. 52 00:03:04,370 --> 00:03:07,310 But as we make the model too complex we end up with these 53 00:03:07,310 --> 00:03:11,580 crazy decision boundaries and eventually the last one here we see is really fitting 54 00:03:11,580 --> 00:03:16,306 the data way too well in an unbelievable way. 55 00:03:16,306 --> 00:03:20,230 So we'll see these amazing visualizations and we'll address this issue by 56 00:03:20,230 --> 00:03:24,170 introducing regularization of a very similar way that we did with linear 57 00:03:24,170 --> 00:03:27,830 regression in the previous course and that will be the focus of the third module. 58 00:03:29,250 --> 00:03:30,280 In the fourth module, 59 00:03:30,280 --> 00:03:34,140 we're going to explore a whole new kind of classifier called the decision tree. 60 00:03:34,140 --> 00:03:36,720 Decision trees are extremely useful in practice. 61 00:03:36,720 --> 00:03:38,740 They're simple, easy to understand. 62 00:03:38,740 --> 00:03:43,870 It can fit data really well providing non linear feature for the data. 63 00:03:43,870 --> 00:03:48,140 So, here's an example where I'm trying to predict whether a loaning might take in 64 00:03:48,140 --> 00:03:52,760 the bank is going to be a good loan safe or is it going to be a risky loan. 65 00:03:52,760 --> 00:03:56,960 So, you might walk in the bank and the bank person might ask you, what's your 66 00:03:56,960 --> 00:04:00,750 credit history like have you been really good, have you paid the previous loans, 67 00:04:00,750 --> 00:04:04,950 if you have Then the log is likely to be safe, likely to be a good log. 68 00:04:04,950 --> 00:04:08,960 If your credit has just been okay, fair, then it depends. 69 00:04:08,960 --> 00:04:11,230 If it's a short loan, then I'm worried, but 70 00:04:11,230 --> 00:04:14,130 if it's a long loan, then eventually you'll pay it off. 71 00:04:14,130 --> 00:04:19,340 Now, if your credit history is bad, don't worry, not everything is bad. 72 00:04:19,340 --> 00:04:20,780 Maybe he'll ask about your income. 73 00:04:20,780 --> 00:04:23,890 If your income is high and you're asking for 74 00:04:23,890 --> 00:04:26,540 long term loan, then maybe it's okay to make your loan. 75 00:04:26,540 --> 00:04:30,070 But if your credit history is bad and your income is low, then you know, 76 00:04:30,070 --> 00:04:31,890 don't even ask me for it. 77 00:04:31,890 --> 00:04:36,510 So that's where a decision tree can capture this really elaborate, or 78 00:04:36,510 --> 00:04:39,670 very explainable cuts over the data. 79 00:04:40,830 --> 00:04:45,850 In Module 5, we're going to see that over 50 is not just a bad, bad problem 80 00:04:45,850 --> 00:04:50,450 with Logistic Regression, but it's also a bad, bad problem with decision trees. 81 00:04:50,450 --> 00:04:54,730 Here, as you make those trees deeper and deeper and deeper, 82 00:04:54,730 --> 00:04:58,980 those decision boundaries can get very, very complicated, and really overfit. 83 00:04:58,980 --> 00:05:01,870 So, we're going to have to do something about it. 84 00:05:01,870 --> 00:05:06,100 What we're going to do is use a fundamental concept called Occam's Razor, 85 00:05:06,100 --> 00:05:09,100 where you try to find the simplest explanation for your data. 86 00:05:09,100 --> 00:05:14,440 And this concept goes back way before Occam was around the 13th Century. 87 00:05:14,440 --> 00:05:18,130 It goes back to Pythagoras and Aristotle, and 88 00:05:18,130 --> 00:05:22,870 those folks said the simplest explanation is often the best one. 89 00:05:22,870 --> 00:05:27,807 So we're going to take this really complex deep tree's and find simple ones that give 90 00:05:27,807 --> 00:05:31,248 you better performance and are less prone to overfitting. 91 00:05:31,248 --> 00:05:31,748 [MUSIC]