1 00:00:00,025 --> 00:00:07,642 [SOUND] In the previous video, we started to discuss regression metrics. 2 00:00:07,642 --> 00:00:12,523 In this video, we'll talk about three more metrics, 3 00:00:12,523 --> 00:00:15,830 (R)MSPE, MAPE, and (R)MSLE. 4 00:00:15,830 --> 00:00:18,020 Think about the following problem. 5 00:00:18,020 --> 00:00:21,267 We need to predict, how many laptops two shops will sell? 6 00:00:22,566 --> 00:00:27,442 And in the train set for a particular date, we see that the first 7 00:00:27,442 --> 00:00:32,230 shop sold 10 items, and the second sold 1,000 items. 8 00:00:33,330 --> 00:00:38,251 Now suppose our model predicts 9 items instead of 10 for 9 00:00:38,251 --> 00:00:44,860 the first shop, and 999 instead of 1,000 for the second. 10 00:00:44,860 --> 00:00:48,172 It could happen that off by one error in the first case, 11 00:00:48,172 --> 00:00:51,860 is much more critical than in the second case. 12 00:00:51,860 --> 00:00:56,842 But MSE and MAE are equal to one for both shops predictions, and 13 00:00:56,842 --> 00:01:03,427 thus according to those metrics, these off by one errors are indistinguishable. 14 00:01:04,428 --> 00:01:09,018 This is basically because MSE and MAE work with absolute errors 15 00:01:09,018 --> 00:01:13,500 while relative error can be more important for us. 16 00:01:13,500 --> 00:01:17,561 Off by one error for the shops that sell ten items is equal 17 00:01:17,561 --> 00:01:23,118 to mistaking by 100 items for shops that sell 1,000 items. 18 00:01:23,118 --> 00:01:24,904 On the plot for MSE and MAE, 19 00:01:24,904 --> 00:01:30,770 we can see that all the error curves have the same shape for every target value. 20 00:01:30,770 --> 00:01:34,590 The curves are kind of shifted version of each other. 21 00:01:34,590 --> 00:01:38,340 That is an indicator that metric works with absolute errors. 22 00:01:39,450 --> 00:01:43,463 The relative error preference can be expressed with 23 00:01:43,463 --> 00:01:45,926 Mean Square Percentage Error, 24 00:01:45,926 --> 00:01:51,125 MSPE in short, or Mean Absolute Percentage Error, MAPE. 25 00:01:51,125 --> 00:01:56,010 If you compare them to MSE and MAE, you will notice the difference. 26 00:01:57,050 --> 00:02:01,444 For each object, the absolute error is divided by the target value, 27 00:02:01,444 --> 00:02:03,052 giving relative error. 28 00:02:04,240 --> 00:02:09,548 MSPE and MAPE can also be thought as weighted versions of MSE and 29 00:02:09,548 --> 00:02:11,460 MAE, respectively. 30 00:02:12,517 --> 00:02:17,990 For the MAPE, the weight of its sample is inversely proportional to it's target. 31 00:02:17,990 --> 00:02:24,142 While for MSPE, it is inversely proportional to a target square. 32 00:02:24,142 --> 00:02:28,095 Know that the weight do not sum up to one here. 33 00:02:29,426 --> 00:02:32,382 You can take a look at this individual error plus for 34 00:02:32,382 --> 00:02:35,300 our individual sample dataset. 35 00:02:35,300 --> 00:02:41,466 Now, we see the course became more flat as the target value increases. 36 00:02:41,466 --> 00:02:45,302 It means that, the cost we pay for a fixed absolute error, 37 00:02:45,302 --> 00:02:47,810 depends on the target value. 38 00:02:47,810 --> 00:02:51,950 And as the target increases, we pay less. 39 00:02:52,980 --> 00:02:58,710 So having talk about definition and motivation behind MSPE and MAPE. 40 00:02:58,710 --> 00:03:04,230 Let's now think, what are the optimal constant predictions for these matrix? 41 00:03:05,280 --> 00:03:12,010 Recall that for MSE, the optimal constant is the mean over target values. 42 00:03:12,010 --> 00:03:17,068 Now, for MSPE, the weighted version of MSE, in turns out that 43 00:03:17,068 --> 00:03:22,773 the optimal constant is weighted mean of the target values. 44 00:03:22,773 --> 00:03:27,211 For our dataset, the optimal value is about 6.6, and 45 00:03:27,211 --> 00:03:31,014 we see that it's biased towards small targets. 46 00:03:31,014 --> 00:03:35,375 Since the absolute error for them is weighted with the highest weight, 47 00:03:35,375 --> 00:03:37,420 and thus inputs metric the most. 48 00:03:38,660 --> 00:03:42,570 Now the MAPE, this is a question for you. 49 00:03:42,570 --> 00:03:46,020 What do you think is an optimal constant for it? 50 00:03:46,020 --> 00:03:49,650 Just use your intuition here and knowledge from the previous slides. 51 00:03:49,650 --> 00:03:54,770 Especially recall that MAPE is weighted version of MAE. 52 00:03:54,770 --> 00:03:58,730 The right answer is, the best constant is weighted median. 53 00:03:59,870 --> 00:04:03,989 It is not a very commonly used quantity actually, so take a look for 54 00:04:03,989 --> 00:04:06,942 a bit of explanation in the reading materials. 55 00:04:08,312 --> 00:04:15,780 The optimal value here is 6, and it is even smaller than the constant for MSPE. 56 00:04:15,780 --> 00:04:20,370 But do not try to explain it using outliers. 57 00:04:20,370 --> 00:04:24,950 If an outlier had a very, very small value, MAPE would be 58 00:04:24,950 --> 00:04:29,660 very biased towards it, since this outlier will have the highest weight. 59 00:04:30,730 --> 00:04:35,630 All right, now let's move on to the last metric in this video, 60 00:04:35,630 --> 00:04:40,075 Root Mean Square Logarithmic Error, or RMSLE in short. 61 00:04:40,075 --> 00:04:42,460 What is RMSLE? 62 00:04:42,460 --> 00:04:46,903 It is just an RMSE calculated in logarithmic scale. 63 00:04:48,128 --> 00:04:53,034 In fact, to calculate it, we take a logarithm of our predictions and 64 00:04:53,034 --> 00:04:56,765 the target values, and compute RMSE between them. 65 00:04:58,413 --> 00:05:02,070 The targets are usually non-negative but can equal to 0, and 66 00:05:02,070 --> 00:05:04,950 the logarithm of 0 is not defined. 67 00:05:04,950 --> 00:05:08,525 That is why a constant is usually added to the predictions and 68 00:05:08,525 --> 00:05:12,661 the targets before applying the logarithmic operation. 69 00:05:12,661 --> 00:05:16,820 This constant can also be chosen to be different to one. 70 00:05:16,820 --> 00:05:21,780 It can be for example 300 depending on organizer's needs. 71 00:05:21,780 --> 00:05:24,872 But for us, it will not change much. 72 00:05:24,872 --> 00:05:30,366 So, this metric is usually used in the same situation as MSPE and 73 00:05:30,366 --> 00:05:37,968 MAPE, as it also carries about relative errors more than about absolute ones. 74 00:05:37,968 --> 00:05:42,583 But note the asymmetry of the error curves. 75 00:05:42,583 --> 00:05:44,971 From the perspective of RMSLE, 76 00:05:44,971 --> 00:05:50,460 it is always better to predict more than the same amount less than target. 77 00:05:51,596 --> 00:05:57,420 Same as root mean square error doesn't differ much from mean square error, 78 00:05:57,420 --> 00:06:01,254 RMSLE can be calculated without root operation. 79 00:06:01,254 --> 00:06:04,441 But the rooted version is more widely used. 80 00:06:04,441 --> 00:06:08,689 It is important to know that the plot we see here on the slide is built for 81 00:06:08,689 --> 00:06:10,790 a version without the root. 82 00:06:10,790 --> 00:06:14,340 And for a root version, an analogous plot would be misleading. 83 00:06:15,450 --> 00:06:19,810 Now let's move on to the question about the best constant. 84 00:06:19,810 --> 00:06:22,160 I will let you guess the answer again. 85 00:06:22,160 --> 00:06:28,464 Just recall that, Just recall what is the best constant prediction for 86 00:06:28,464 --> 00:06:32,560 RMSE and use the connection between RMSLE and RMSE. 87 00:06:33,710 --> 00:06:38,820 To find the constant, we should realize that we can first find the best 88 00:06:38,820 --> 00:06:45,020 constant for RMSE in the log space, will be the weighted mean in the log space. 89 00:06:45,020 --> 00:06:46,100 And after it, 90 00:06:46,100 --> 00:06:51,100 we need to get back from log space to the usual one with an inverse transform. 91 00:06:52,190 --> 00:06:56,480 The optimal constant turns out to be 9.1. 92 00:06:56,480 --> 00:07:01,958 It is higher than constants for both MAPE and MSPE. 93 00:07:01,958 --> 00:07:05,910 Here we see the optimal constants for the metrics we've broken down. 94 00:07:07,000 --> 00:07:11,767 MSE is quite biased towards the huge value from our dataset, 95 00:07:11,767 --> 00:07:15,036 while MAE is much less biased. 96 00:07:15,036 --> 00:07:19,217 MSPE and MAPE are biased towards smaller targets because 97 00:07:19,217 --> 00:07:24,632 they assign higher weight to the object with small targets. 98 00:07:24,632 --> 00:07:29,985 And RMSLE is frequently considered as better metrics than MAPE, 99 00:07:29,985 --> 00:07:37,196 since it is less biased towards small targets, yet works with relative errors. 100 00:07:37,196 --> 00:07:41,071 I strongly encourage you to think about the baseline for 101 00:07:41,071 --> 00:07:43,967 metrics that you can face for first time. 102 00:07:45,080 --> 00:07:50,050 It truly helps to build an intuition and to find a way to optimize the metrics. 103 00:07:50,050 --> 00:07:51,570 So, in this video, 104 00:07:51,570 --> 00:07:56,260 we will discuss different metrics that works with relative errors. 105 00:07:56,260 --> 00:08:03,659 MSPE, means square percentage error, MAPE, mean absolute percentage error, 106 00:08:03,659 --> 00:08:08,869 and RMSLE, root mean squared logarithmic error. 107 00:08:08,869 --> 00:08:12,060 We'll discussed the definitions and the baseline solutions for them. 108 00:08:13,340 --> 00:08:17,153 In the next video, we will study several classification matrix. 109 00:08:17,153 --> 00:08:27,153 [MUSIC]