[MUSIC] So we've seen the logistic regression model and explored it quite a bit. And we hinted at what learning means, finding the best parameters for those models. However, we have talked about features in a kind of abstract way. We said we have number of awesomes, number of awfuls, and so on. But we have to think a little bit harder when our inputs are called categorical variables. So let's take a little example. If our inputs x were numeric values like the number of awesomes, somebody's age, somebody's salary, it's kind of natural to multiply that by particular coefficients. So 1.5 times the number of awesomes makes sense, or 17 times your salary kind of makes sense as a numeric value in that score function. However, if you use categorical inputs like male, female, the country of birth, or the postal code, which in the U.S. is called a zipcode. In the U.S. the postal code or zipcode is defined by three, so by five numeric numbers. So for example, 10005 or 98195. This is numeric numbers that you can manage multiplying by coefficient, however, they don't really behave like numeric values, they behave more like categorical values. So for example, 98195 is not nine times bigger than 10005. It's just the different part of the country. So even numbers, if they don't behave like a continuous scale but behave more like an indicator of location like in this example, the indicator of category, then we still have to encode them in interesting ways if we're going to multiply them by some coefficient. So the question is, how do we multiply a coefficient like 1.5 minus 2.7 with this category called variables. And to do this we need to use what's called an encoding. An encoding takes an input which is categorical, for example country of birth and tries to encode it using some kind of numerical values that are naturally multiplied by some coefficients. So for example, country of birth, there might be 196 possible countries or categories that that value comes from. And so one way to encode this is using what's called 1-hot encoding, where you create one feature for every possible country. So for example there might be a feature for Argentina, a feature for Brazil, and so on, all the way to a feature for Zimbabwe. And so if somebody's born in Brazil then the feature for Argentina has value 0, the feature for Brazil has value 1, and all the other features have value 0. So only one of these features has value 1 at the time, everything else is 0, that's why it's called 1-hot. It's from electrical engineering, that means one on or one active encoding. Similarly if somebody's born in Zimbabwe, we're going to get 0, 0, 0, 0, and just 1 in the feature h196 which corresponds to Zimbabwe birth. So that's one kind of encoding. And implicitly in this module, we've actually explored a different kind of encoding for text data. And we discussed that in the first course, what's called the Bag of Words encoding. So a review is defined by text, and text can have say 10,000 different words that come from it, or more, many more, millions. And so what Bag of Words does is take that text, and then codes its as counts. So, for example, I might associate h1 with the number of awesomes, h2 with the number of awful. And so on all the way to say h10,000 which might be the number of sushis. So the number of times the word sushi appears. And a particular data point might have 2 awesomes, 0 awfuls, 0 bunch of different things, and maybe 3 sushis. And so it becomes a really, really sparse 10,000 additional vectors. In both of these cases, we've taken a categorical input, and defined a set of features, one for each possible category, to contain either a single value on or account. And we can feed this directly into the logistic regression model that we discussed so far. This type of encoding is really fundamental in practice, and you should really familiarize yourself with them. [MUSIC]