[MUSIC] The notation that we've used so far is without features associated with it, but just like the regression course we're going to focus on introducing features from the very beginning. So we're going to have these functions h1 through h capital d. They've defined some features we might extract from the data. and we are going to encode the constant function that is h0. So in particular we are going to have to discard is w0, h0, +w1, h1, w2, h2, all the way to w capital d, hd so a feature could be a constant h0, it could be #awesome's for h1, #aweful's for h2. It could be some translations like log of number of awesome's times number of bad's. Or more realistically, it could be the TFIDF of number awful's which help us emphasize more words which are more distinctive, or important that we looked at PFIDF in the first course, and explored it quite a bit. And we're going to revisit that in the next course. So now we have this prediction score which is based on the sum over the features of the coefficient wi, wj times the feature hj. And we're going to use short hand a w transposed hi(xi) to denote the scores. So you see me do that a lot. W transpose h(xi) denotes the score for that particular data point and if that score is greater than 0 we're going to say positive and if the score is less than 0, we're going to say it's negative. Very good, so now we introduce our model in a little bit more detail. And I'm always going to take the input data x, feed it through the feature generating function, which might be counting the number of awesomes or creating a TF-IDF model. I'm going to feed that to the machinery model which is going to multiply the features with the learned weights that you had an output of value a score which we're going to throw through the same function an output item that Y had was plus one positive reviews of minus one, negative reviews. [MUSIC]