[SOUND] In the previous video,
we started to discuss regression metrics. In this video,
we'll talk about three more metrics, (R)MSPE, MAPE, and (R)MSLE. Think about the following problem. We need to predict,
how many laptops two shops will sell? And in the train set for
a particular date, we see that the first shop sold 10 items, and
the second sold 1,000 items. Now suppose our model predicts
9 items instead of 10 for the first shop, and
999 instead of 1,000 for the second. It could happen that off by
one error in the first case, is much more critical
than in the second case. But MSE and MAE are equal to one for
both shops predictions, and thus according to those metrics, these
off by one errors are indistinguishable. This is basically because MSE and
MAE work with absolute errors while relative error can
be more important for us. Off by one error for
the shops that sell ten items is equal to mistaking by 100 items for
shops that sell 1,000 items. On the plot for MSE and MAE, we can see that all the error curves have
the same shape for every target value. The curves are kind of shifted
version of each other. That is an indicator that metric
works with absolute errors. The relative error preference
can be expressed with Mean Square Percentage Error, MSPE in short, or
Mean Absolute Percentage Error, MAPE. If you compare them to MSE and MAE,
you will notice the difference. For each object, the absolute error
is divided by the target value, giving relative error. MSPE and MAPE can also be thought
as weighted versions of MSE and MAE, respectively. For the MAPE, the weight of its sample is
inversely proportional to it's target. While for MSPE, it is inversely
proportional to a target square. Know that the weight do
not sum up to one here. You can take a look at this
individual error plus for our individual sample dataset. Now, we see the course became more
flat as the target value increases. It means that, the cost we pay for
a fixed absolute error, depends on the target value. And as the target increases, we pay less. So having talk about definition and
motivation behind MSPE and MAPE. Let's now think, what are the optimal
constant predictions for these matrix? Recall that for MSE, the optimal
constant is the mean over target values. Now, for MSPE, the weighted
version of MSE, in turns out that the optimal constant is weighted
mean of the target values. For our dataset,
the optimal value is about 6.6, and we see that it's biased
towards small targets. Since the absolute error for
them is weighted with the highest weight, and thus inputs metric the most. Now the MAPE, this is a question for you. What do you think is
an optimal constant for it? Just use your intuition here and
knowledge from the previous slides. Especially recall that MAPE
is weighted version of MAE. The right answer is,
the best constant is weighted median. It is not a very commonly used
quantity actually, so take a look for a bit of explanation in
the reading materials. The optimal value here is 6, and it is
even smaller than the constant for MSPE. But do not try to explain
it using outliers. If an outlier had a very,
very small value, MAPE would be very biased towards it, since this
outlier will have the highest weight. All right, now let's move on to
the last metric in this video, Root Mean Square Logarithmic Error,
or RMSLE in short. What is RMSLE? It is just an RMSE calculated
in logarithmic scale. In fact, to calculate it,
we take a logarithm of our predictions and the target values, and
compute RMSE between them. The targets are usually non-negative but
can equal to 0, and the logarithm of 0 is not defined. That is why a constant is usually
added to the predictions and the targets before applying
the logarithmic operation. This constant can also be
chosen to be different to one. It can be for example 300
depending on organizer's needs. But for us, it will not change much. So, this metric is usually used
in the same situation as MSPE and MAPE, as it also carries about relative
errors more than about absolute ones. But note the asymmetry
of the error curves. From the perspective of RMSLE, it is always better to predict more
than the same amount less than target. Same as root mean square error doesn't
differ much from mean square error, RMSLE can be calculated
without root operation. But the rooted version
is more widely used. It is important to know that the plot
we see here on the slide is built for a version without the root. And for a root version,
an analogous plot would be misleading. Now let's move on to the question
about the best constant. I will let you guess the answer again. Just recall that, Just recall what
is the best constant prediction for RMSE and
use the connection between RMSLE and RMSE. To find the constant, we should realize
that we can first find the best constant for RMSE in the log space, will
be the weighted mean in the log space. And after it, we need to get back from log space to
the usual one with an inverse transform. The optimal constant turns out to be 9.1. It is higher than constants for
both MAPE and MSPE. Here we see the optimal constants for
the metrics we've broken down. MSE is quite biased towards
the huge value from our dataset, while MAE is much less biased. MSPE and MAPE are biased
towards smaller targets because they assign higher weight to
the object with small targets. And RMSLE is frequently considered
as better metrics than MAPE, since it is less biased towards small
targets, yet works with relative errors. I strongly encourage you to
think about the baseline for metrics that you can face for first time. It truly helps to build an intuition and
to find a way to optimize the metrics. So, in this video, we will discuss different metrics
that works with relative errors. MSPE, means square percentage error,
MAPE, mean absolute percentage error, and RMSLE,
root mean squared logarithmic error. We'll discussed the definitions and
the baseline solutions for them. In the next video, we will study
several classification matrix. [MUSIC]