Let's take a couple minutes now to dig in and see what's going to happen in each module of this course. So this was the overview. We're going to have 9 modules and some models are going to appear in multiple modules, some modules are going to appear in one module. Some concepts are going to cover multiple modules. But in general, we're going to see a pre-cohesive presentation where we start with everything about linear classifiers and logistic regression and everything about decision trees. And then we'll cover boosting and some advanced topics. So in particular, we're going to start linear classifiers. And we discussed those in the first course. So for example in a sentiment analysis case we have two words that I care about, the number of times the word awful appears in that review and the number of times the word awesome appears in the review. And the linear classifier might say that every awesome is worth one, every awful is worth minus 1.5. And when you have a particular review if it has for example like the one on the bottom has three awesome and zero awful when that classified that this positive because this score is greater than zero and that's true for everything below that line. And everything above the line had a score of less than zero, and we're going to classify those as negative points. We discussed this concept of linear classifiers in the first course. We'll go in depth and really understand those in the first module. But then we'll extend it with something called logistic regression, which is going to allow us to not only predict plus one or minus one but assign a probability to every data point. So for example, the points with bright green are very likely to be positive. The point is bright feature very like to be negative. But the white pan in between are areas where less certain. So it's 50/50 probability. We are going to be able to predict those probabilities and that's going to really change how our classifiers will get used. Now the way to define a linear classifier in logistic regression. In the second module we're going to figure out how to learn the parameters, the coefficients of the classifiers from data. So we find the notion of likelihood which measures how well a line is classified as data and for different values of the coefficient we're going to have different lines or different classifiers and we're going to use gradient descent to find the best possible classifier. Similar way to what we did in linear regression in the previous course. Overfitting can be a really significant thing in classification. So here I'm plotting the classification error in the Y-axis, as we make models more and more complex. And as you know, the training error will go to zero as you make the model more complex. But the true error will go down and eventually go up as we overfit. So, in classification, the way we are going to experience that is our decision boundaries will start very simple like a line separating positives and negatives, and then maybe it will become a curved parabola that fits the data pretty well. But as we make the model too complex we end up with these crazy decision boundaries and eventually the last one here we see is really fitting the data way too well in an unbelievable way. So we'll see these amazing visualizations and we'll address this issue by introducing regularization of a very similar way that we did with linear regression in the previous course and that will be the focus of the third module. In the fourth module, we're going to explore a whole new kind of classifier called the decision tree. Decision trees are extremely useful in practice. They're simple, easy to understand. It can fit data really well providing non linear feature for the data. So, here's an example where I'm trying to predict whether a loaning might take in the bank is going to be a good loan safe or is it going to be a risky loan. So, you might walk in the bank and the bank person might ask you, what's your credit history like have you been really good, have you paid the previous loans, if you have Then the log is likely to be safe, likely to be a good log. If your credit has just been okay, fair, then it depends. If it's a short loan, then I'm worried, but if it's a long loan, then eventually you'll pay it off. Now, if your credit history is bad, don't worry, not everything is bad. Maybe he'll ask about your income. If your income is high and you're asking for long term loan, then maybe it's okay to make your loan. But if your credit history is bad and your income is low, then you know, don't even ask me for it. So that's where a decision tree can capture this really elaborate, or very explainable cuts over the data. In Module 5, we're going to see that over 50 is not just a bad, bad problem with Logistic Regression, but it's also a bad, bad problem with decision trees. Here, as you make those trees deeper and deeper and deeper, those decision boundaries can get very, very complicated, and really overfit. So, we're going to have to do something about it. What we're going to do is use a fundamental concept called Occam's Razor, where you try to find the simplest explanation for your data. And this concept goes back way before Occam was around the 13th Century. It goes back to Pythagoras and Aristotle, and those folks said the simplest explanation is often the best one. So we're going to take this really complex deep tree's and find simple ones that give you better performance and are less prone to overfitting. [MUSIC]