1
00:00:00,025 --> 00:00:07,642
[SOUND] In the previous video,
we started to discuss regression metrics.

2
00:00:07,642 --> 00:00:12,523
In this video,
we'll talk about three more metrics,

3
00:00:12,523 --> 00:00:15,830
(R)MSPE, MAPE, and (R)MSLE.

4
00:00:15,830 --> 00:00:18,020
Think about the following problem.

5
00:00:18,020 --> 00:00:21,267
We need to predict,
how many laptops two shops will sell?

6
00:00:22,566 --> 00:00:27,442
And in the train set for
a particular date, we see that the first

7
00:00:27,442 --> 00:00:32,230
shop sold 10 items, and
the second sold 1,000 items.

8
00:00:33,330 --> 00:00:38,251
Now suppose our model predicts
9 items instead of 10 for

9
00:00:38,251 --> 00:00:44,860
the first shop, and
999 instead of 1,000 for the second.

10
00:00:44,860 --> 00:00:48,172
It could happen that off by
one error in the first case,

11
00:00:48,172 --> 00:00:51,860
is much more critical
than in the second case.

12
00:00:51,860 --> 00:00:56,842
But MSE and MAE are equal to one for
both shops predictions, and

13
00:00:56,842 --> 00:01:03,427
thus according to those metrics, these
off by one errors are indistinguishable.

14
00:01:04,428 --> 00:01:09,018
This is basically because MSE and
MAE work with absolute errors

15
00:01:09,018 --> 00:01:13,500
while relative error can
be more important for us.

16
00:01:13,500 --> 00:01:17,561
Off by one error for
the shops that sell ten items is equal

17
00:01:17,561 --> 00:01:23,118
to mistaking by 100 items for
shops that sell 1,000 items.

18
00:01:23,118 --> 00:01:24,904
On the plot for MSE and MAE,

19
00:01:24,904 --> 00:01:30,770
we can see that all the error curves have
the same shape for every target value.

20
00:01:30,770 --> 00:01:34,590
The curves are kind of shifted
version of each other.

21
00:01:34,590 --> 00:01:38,340
That is an indicator that metric
works with absolute errors.

22
00:01:39,450 --> 00:01:43,463
The relative error preference
can be expressed with

23
00:01:43,463 --> 00:01:45,926
Mean Square Percentage Error,

24
00:01:45,926 --> 00:01:51,125
MSPE in short, or
Mean Absolute Percentage Error, MAPE.

25
00:01:51,125 --> 00:01:56,010
If you compare them to MSE and MAE,
you will notice the difference.

26
00:01:57,050 --> 00:02:01,444
For each object, the absolute error
is divided by the target value,

27
00:02:01,444 --> 00:02:03,052
giving relative error.

28
00:02:04,240 --> 00:02:09,548
MSPE and MAPE can also be thought
as weighted versions of MSE and

29
00:02:09,548 --> 00:02:11,460
MAE, respectively.

30
00:02:12,517 --> 00:02:17,990
For the MAPE, the weight of its sample is
inversely proportional to it's target.

31
00:02:17,990 --> 00:02:24,142
While for MSPE, it is inversely
proportional to a target square.

32
00:02:24,142 --> 00:02:28,095
Know that the weight do
not sum up to one here.

33
00:02:29,426 --> 00:02:32,382
You can take a look at this
individual error plus for

34
00:02:32,382 --> 00:02:35,300
our individual sample dataset.

35
00:02:35,300 --> 00:02:41,466
Now, we see the course became more
flat as the target value increases.

36
00:02:41,466 --> 00:02:45,302
It means that, the cost we pay for
a fixed absolute error,

37
00:02:45,302 --> 00:02:47,810
depends on the target value.

38
00:02:47,810 --> 00:02:51,950
And as the target increases, we pay less.

39
00:02:52,980 --> 00:02:58,710
So having talk about definition and
motivation behind MSPE and MAPE.

40
00:02:58,710 --> 00:03:04,230
Let's now think, what are the optimal
constant predictions for these matrix?

41
00:03:05,280 --> 00:03:12,010
Recall that for MSE, the optimal
constant is the mean over target values.

42
00:03:12,010 --> 00:03:17,068
Now, for MSPE, the weighted
version of MSE, in turns out that

43
00:03:17,068 --> 00:03:22,773
the optimal constant is weighted
mean of the target values.

44
00:03:22,773 --> 00:03:27,211
For our dataset,
the optimal value is about 6.6, and

45
00:03:27,211 --> 00:03:31,014
we see that it's biased
towards small targets.

46
00:03:31,014 --> 00:03:35,375
Since the absolute error for
them is weighted with the highest weight,

47
00:03:35,375 --> 00:03:37,420
and thus inputs metric the most.

48
00:03:38,660 --> 00:03:42,570
Now the MAPE, this is a question for you.

49
00:03:42,570 --> 00:03:46,020
What do you think is
an optimal constant for it?

50
00:03:46,020 --> 00:03:49,650
Just use your intuition here and
knowledge from the previous slides.

51
00:03:49,650 --> 00:03:54,770
Especially recall that MAPE
is weighted version of MAE.

52
00:03:54,770 --> 00:03:58,730
The right answer is,
the best constant is weighted median.

53
00:03:59,870 --> 00:04:03,989
It is not a very commonly used
quantity actually, so take a look for

54
00:04:03,989 --> 00:04:06,942
a bit of explanation in
the reading materials.

55
00:04:08,312 --> 00:04:15,780
The optimal value here is 6, and it is
even smaller than the constant for MSPE.

56
00:04:15,780 --> 00:04:20,370
But do not try to explain
it using outliers.

57
00:04:20,370 --> 00:04:24,950
If an outlier had a very,
very small value, MAPE would be

58
00:04:24,950 --> 00:04:29,660
very biased towards it, since this
outlier will have the highest weight.

59
00:04:30,730 --> 00:04:35,630
All right, now let's move on to
the last metric in this video,

60
00:04:35,630 --> 00:04:40,075
Root Mean Square Logarithmic Error,
or RMSLE in short.

61
00:04:40,075 --> 00:04:42,460
What is RMSLE?

62
00:04:42,460 --> 00:04:46,903
It is just an RMSE calculated
in logarithmic scale.

63
00:04:48,128 --> 00:04:53,034
In fact, to calculate it,
we take a logarithm of our predictions and

64
00:04:53,034 --> 00:04:56,765
the target values, and
compute RMSE between them.

65
00:04:58,413 --> 00:05:02,070
The targets are usually non-negative but
can equal to 0, and

66
00:05:02,070 --> 00:05:04,950
the logarithm of 0 is not defined.

67
00:05:04,950 --> 00:05:08,525
That is why a constant is usually
added to the predictions and

68
00:05:08,525 --> 00:05:12,661
the targets before applying
the logarithmic operation.

69
00:05:12,661 --> 00:05:16,820
This constant can also be
chosen to be different to one.

70
00:05:16,820 --> 00:05:21,780
It can be for example 300
depending on organizer's needs.

71
00:05:21,780 --> 00:05:24,872
But for us, it will not change much.

72
00:05:24,872 --> 00:05:30,366
So, this metric is usually used
in the same situation as MSPE and

73
00:05:30,366 --> 00:05:37,968
MAPE, as it also carries about relative
errors more than about absolute ones.

74
00:05:37,968 --> 00:05:42,583
But note the asymmetry
of the error curves.

75
00:05:42,583 --> 00:05:44,971
From the perspective of RMSLE,

76
00:05:44,971 --> 00:05:50,460
it is always better to predict more
than the same amount less than target.

77
00:05:51,596 --> 00:05:57,420
Same as root mean square error doesn't
differ much from mean square error,

78
00:05:57,420 --> 00:06:01,254
RMSLE can be calculated
without root operation.

79
00:06:01,254 --> 00:06:04,441
But the rooted version
is more widely used.

80
00:06:04,441 --> 00:06:08,689
It is important to know that the plot
we see here on the slide is built for

81
00:06:08,689 --> 00:06:10,790
a version without the root.

82
00:06:10,790 --> 00:06:14,340
And for a root version,
an analogous plot would be misleading.

83
00:06:15,450 --> 00:06:19,810
Now let's move on to the question
about the best constant.

84
00:06:19,810 --> 00:06:22,160
I will let you guess the answer again.

85
00:06:22,160 --> 00:06:28,464
Just recall that, Just recall what
is the best constant prediction for

86
00:06:28,464 --> 00:06:32,560
RMSE and
use the connection between RMSLE and RMSE.

87
00:06:33,710 --> 00:06:38,820
To find the constant, we should realize
that we can first find the best

88
00:06:38,820 --> 00:06:45,020
constant for RMSE in the log space, will
be the weighted mean in the log space.

89
00:06:45,020 --> 00:06:46,100
And after it,

90
00:06:46,100 --> 00:06:51,100
we need to get back from log space to
the usual one with an inverse transform.

91
00:06:52,190 --> 00:06:56,480
The optimal constant turns out to be 9.1.

92
00:06:56,480 --> 00:07:01,958
It is higher than constants for
both MAPE and MSPE.

93
00:07:01,958 --> 00:07:05,910
Here we see the optimal constants for
the metrics we've broken down.

94
00:07:07,000 --> 00:07:11,767
MSE is quite biased towards
the huge value from our dataset,

95
00:07:11,767 --> 00:07:15,036
while MAE is much less biased.

96
00:07:15,036 --> 00:07:19,217
MSPE and MAPE are biased
towards smaller targets because

97
00:07:19,217 --> 00:07:24,632
they assign higher weight to
the object with small targets.

98
00:07:24,632 --> 00:07:29,985
And RMSLE is frequently considered
as better metrics than MAPE,

99
00:07:29,985 --> 00:07:37,196
since it is less biased towards small
targets, yet works with relative errors.

100
00:07:37,196 --> 00:07:41,071
I strongly encourage you to
think about the baseline for

101
00:07:41,071 --> 00:07:43,967
metrics that you can face for first time.

102
00:07:45,080 --> 00:07:50,050
It truly helps to build an intuition and
to find a way to optimize the metrics.

103
00:07:50,050 --> 00:07:51,570
So, in this video,

104
00:07:51,570 --> 00:07:56,260
we will discuss different metrics
that works with relative errors.

105
00:07:56,260 --> 00:08:03,659
MSPE, means square percentage error,
MAPE, mean absolute percentage error,

106
00:08:03,659 --> 00:08:08,869
and RMSLE,
root mean squared logarithmic error.

107
00:08:08,869 --> 00:08:12,060
We'll discussed the definitions and
the baseline solutions for them.

108
00:08:13,340 --> 00:08:17,153
In the next video, we will study
several classification matrix.

109
00:08:17,153 --> 00:08:27,153
[MUSIC]