[MUSIC] The notation that we've used so far is
without features associated with it, but just like the regression course
we're going to focus on introducing features from the very beginning. So we're going to have these
functions h1 through h capital d. They've defined some features
we might extract from the data. and we are going to encode
the constant function that is h0. So in particular we are going
to have to discard is w0, h0, +w1, h1, w2, h2,
all the way to w capital d, hd so a feature could be a constant h0, it could be #awesome's for
h1, #aweful's for h2. It could be some translations like log of
number of awesome's times number of bad's. Or more realistically,
it could be the TFIDF of number awful's which help us emphasize more
words which are more distinctive, or important that we looked at PFIDF in the
first course, and explored it quite a bit. And we're going to revisit
that in the next course. So now we have this prediction score which
is based on the sum over the features of the coefficient wi,
wj times the feature hj. And we're going to use short hand a w
transposed hi(xi) to denote the scores. So you see me do that a lot. W transpose h(xi) denotes the score for that particular data point and if that
score is greater than 0 we're going to say positive and if the score is less than 0,
we're going to say it's negative. Very good, so now we introduce our
model in a little bit more detail. And I'm always going to
take the input data x, feed it through the feature
generating function, which might be counting the number of
awesomes or creating a TF-IDF model. I'm going to feed that to the machinery
model which is going to multiply the features with the learned weights that
you had an output of value a score which we're going to throw through the same
function an output item that Y had was plus one positive reviews of minus one,
negative reviews. [MUSIC]