1
00:00:00,264 --> 00:00:04,301
[MUSIC]

2
00:00:04,301 --> 00:00:06,693
So having finished the preceding modules,

3
00:00:06,693 --> 00:00:10,633
I'm feeling pretty confident that I
come in, I can specify a model, and

4
00:00:10,633 --> 00:00:14,100
I can also specify an algorithm for
how to fit that model.

5
00:00:14,100 --> 00:00:17,950
In doing that, I come in and
I get some fitted function, and

6
00:00:17,950 --> 00:00:20,770
I know how to use that
function to make predictions.

7
00:00:20,770 --> 00:00:23,720
So I go, I make predictions
about the value of my house.

8
00:00:23,720 --> 00:00:25,640
I go to sell my house, and I make money.

9
00:00:25,640 --> 00:00:26,700
And I'm happy, right?

10
00:00:26,700 --> 00:00:27,360
I did a good job.

11
00:00:28,580 --> 00:00:30,920
Well, maybe, maybe not.

12
00:00:30,920 --> 00:00:34,136
Maybe my predictions weren't that good.

13
00:00:34,136 --> 00:00:38,747
And so, as a result, the value that
I list my house for was inaccurate.

14
00:00:38,747 --> 00:00:42,090
And maybe I end up losing
money as a result of that.

15
00:00:42,090 --> 00:00:43,610
So what we can think about,

16
00:00:43,610 --> 00:00:47,610
is a measure of how much are we losing
when we make a certain prediction?

17
00:00:49,120 --> 00:00:53,750
So for example in the housing application,
if we list the house value as too low,

18
00:00:53,750 --> 00:00:55,464
then maybe we get low offers.

19
00:00:55,464 --> 00:00:59,031
And that's a cost to me relative to
having made a better prediction.

20
00:00:59,031 --> 00:01:03,207
Or if I list the value as too high,
maybe people don't come see the house and

21
00:01:03,207 --> 00:01:04,920
I don't get any offers.

22
00:01:04,920 --> 00:01:08,080
Or maybe people notice that
not many people are showing up

23
00:01:08,080 --> 00:01:10,920
to look at the house and
they make me a very low offer.

24
00:01:10,920 --> 00:01:15,400
So, again, I'm in the situation of
being in a worse financial state

25
00:01:15,400 --> 00:01:18,119
having made a poor prediction
of the value of my house.

26
00:01:19,420 --> 00:01:20,860
So a question is,

27
00:01:20,860 --> 00:01:24,590
how much am I losing compared to
having made perfect predictions?

28
00:01:24,590 --> 00:01:27,950
Of course we can never make perfect
predictions, the way in which the world

29
00:01:27,950 --> 00:01:32,570
works is really complicated, and we can't
hope to perfectly model that as well as

30
00:01:32,570 --> 00:01:37,240
the noise that's adherent in the process
of any observations we might see.

31
00:01:38,390 --> 00:01:43,125
But let's just imagine that we
could perfectly predict the value,

32
00:01:43,125 --> 00:01:46,377
then we'd say, in that case,
our loss is 0.

33
00:01:46,377 --> 00:01:48,880
We're not losing any money
because we did perfectly.

34
00:01:49,960 --> 00:01:54,040
So a question is, how do we formalize
this notion of how much we're losing?

35
00:01:54,040 --> 00:01:57,770
And in machine learning, we do this by
defining something called a loss function.

36
00:01:57,770 --> 00:02:01,500
And what the loss function
specifies is the cost incurred

37
00:02:01,500 --> 00:02:05,970
when the true observation is y,
and I make some other prediction.

38
00:02:05,970 --> 00:02:08,440
So, a bit more explicitly,
what we're gonna do,

39
00:02:08,440 --> 00:02:11,230
is we're gonna estimate
our model parameters.

40
00:02:11,230 --> 00:02:12,840
And those are w hat.

41
00:02:12,840 --> 00:02:15,200
We're gonna use those to form predictions.

42
00:02:15,200 --> 00:02:17,700
So, this notation here,

43
00:02:17,700 --> 00:02:23,430
f sub w hat is something we've
equivalently written as f hat, but

44
00:02:23,430 --> 00:02:28,740
for reasons that we'll see later in this
module, this notation is very convenient.

45
00:02:28,740 --> 00:02:35,716
And what it is, is it's our
predicted value at some input x.

46
00:02:35,716 --> 00:02:38,360
And y is the true value.

47
00:02:38,360 --> 00:02:40,281
And this loss function, L,

48
00:02:40,281 --> 00:02:44,610
is somehow measuring the difference
between these two things.

49
00:02:44,610 --> 00:02:47,403
And there are a couple ways in which
we could define loss function.

50
00:02:47,403 --> 00:02:49,353
Well, there's actually many,
many ways, but

51
00:02:49,353 --> 00:02:51,770
I'm just gonna go through
a couple examples.

52
00:02:51,770 --> 00:02:56,690
And in particular, these examples that I'm
gonna go through assume that the cost you

53
00:02:56,690 --> 00:03:02,460
incur by doing an overestimate, relative
to an underestimate, are exactly the same.

54
00:03:02,460 --> 00:03:07,739
So there's no difference in listing
my house as $1,000 too high,

55
00:03:07,739 --> 00:03:10,341
relative to $1,000 too low.

56
00:03:10,341 --> 00:03:14,600
Okay, so we're assuming what's called a
symmetric loss function in these examples.

57
00:03:14,600 --> 00:03:19,720
And very common choices include assuming
something that's called absolute error,

58
00:03:19,720 --> 00:03:24,022
which just looks at the absolute
value of the difference between your

59
00:03:24,022 --> 00:03:28,590
true value and your predicted value.

60
00:03:28,590 --> 00:03:32,770
And another common choice is something
called squared error, where, instead of

61
00:03:32,770 --> 00:03:36,540
just looking at the absolute value,
you look at the square of that difference.

62
00:03:36,540 --> 00:03:41,773
And so that means that you have a very
high cost if that difference is large,

63
00:03:41,773 --> 00:03:44,196
relative to just absolute error.

64
00:03:44,196 --> 00:03:46,380
So as we're going through this module,

65
00:03:46,380 --> 00:03:50,120
it's useful to keep in the back of
your mind this quote by George Box.

66
00:03:50,120 --> 00:03:53,450
Which says that,
Remember that all models are wrong;

67
00:03:53,450 --> 00:03:57,970
the practical question is how wrong
do they have to be to not be useful.

68
00:03:57,970 --> 00:04:02,259
Okay, so we have spent a lot of
time defining different models, and

69
00:04:02,259 --> 00:04:06,850
now we're gonna have tools to assess
the performance of these methods,

70
00:04:06,850 --> 00:04:11,842
to think about these questions of whether
they can be useful to us in practice.

71
00:04:11,842 --> 00:04:16,119
[MUSIC]