[MUSIC] So we've seen the logistic regression
model and explored it quite a bit. And we hinted at what learning means,
finding the best parameters for those models. However, we have talked about
features in a kind of abstract way. We said we have number of awesomes,
number of awfuls, and so on. But we have to think a little bit
harder when our inputs are called categorical variables. So let's take a little example. If our inputs x were numeric values like
the number of awesomes, somebody's age, somebody's salary, it's kind of natural to
multiply that by particular coefficients. So 1.5 times the number of
awesomes makes sense, or 17 times your salary kind of makes sense
as a numeric value in that score function. However, if you use categorical
inputs like male, female, the country of birth, or the postal code,
which in the U.S. is called a zipcode. In the U.S. the postal code or zipcode is defined
by three, so by five numeric numbers. So for example, 10005 or 98195. This is numeric numbers that you can
manage multiplying by coefficient, however, they don't really
behave like numeric values, they behave more like categorical values. So for example,
98195 is not nine times bigger than 10005. It's just the different
part of the country. So even numbers, if they don't
behave like a continuous scale but behave more like an indicator of
location like in this example, the indicator of category,
then we still have to encode them in interesting ways if we're going to
multiply them by some coefficient. So the question is,
how do we multiply a coefficient like 1.5 minus 2.7 with this
category called variables. And to do this we need to use
what's called an encoding. An encoding takes an input
which is categorical, for example country of birth and
tries to encode it using some kind of numerical values that are naturally
multiplied by some coefficients. So for example, country of birth, there might be 196 possible countries or
categories that that value comes from. And so one way to encode this is
using what's called 1-hot encoding, where you create one feature for
every possible country. So for example there might be a feature
for Argentina, a feature for Brazil, and so on, all the way to a feature for
Zimbabwe. And so if somebody's born in Brazil then
the feature for Argentina has value 0, the feature for Brazil has value 1, and
all the other features have value 0. So only one of these features has value
1 at the time, everything else is 0, that's why it's called 1-hot. It's from electrical engineering,
that means one on or one active encoding. Similarly if somebody's born in Zimbabwe,
we're going to get 0, 0, 0, 0, and just 1 in the feature h196 which
corresponds to Zimbabwe birth. So that's one kind of encoding. And implicitly in this module, we've actually explored a different
kind of encoding for text data. And we discussed that in the first course,
what's called the Bag of Words encoding. So a review is defined by text,
and text can have say 10,000 different words that come from it,
or more, many more, millions. And so what Bag of Words does is take
that text, and then codes its as counts. So, for example, I might associate h1 with the number of awesomes, h2 with the number of awful. And so on all the way to say h10,000
which might be the number of sushis. So the number of times
the word sushi appears. And a particular data point
might have 2 awesomes, 0 awfuls, 0 bunch of different things,
and maybe 3 sushis. And so it becomes a really,
really sparse 10,000 additional vectors. In both of these cases,
we've taken a categorical input, and defined a set of features,
one for each possible category, to contain either a single value on or
account. And we can feed this directly into
the logistic regression model that we discussed so far. This type of encoding is really
fundamental in practice, and you should really familiarize
yourself with them. [MUSIC]