[MUSIC] Hi, in this lesson, we will talk about
a major part of any competition. The metrics that are used
to evaluate a solution. In this video, we'll discuss why
there are so many metrics and why it is necessary to know what
metric is used in a competition. In the following videos, we will study what is the difference
between a loss and a metric? And we'll overview and show optimization
techniques for the most important and common metrics. In the course, we focus on regression and
classification. So we only discuss metric for these tasks. For better understanding, we will also
build a simple baseline for each metric. That is what the best constant to
predict for that particular method. So metrics are an essential
part of any competition. They are used to evaluate our submissions. Okay, but why do we have a different
evolution metric on each competition? That is because there are plenty of ways
to measure equality of an algorithm and each company decides for themselves what is the most appropriate
way for their particular problem. For example, let's say an online shop is trying to
maximize effectiveness of their website. The thing is you need to
formalize what is effectiveness. You need to define a metric
how effectiveness is measured. It can be a number of times
a website was visited, or the number of times something
was ordered using this website. So the company usually decides for
itself what quantity is most important for it and
then tries to optimize it. In the competitions, the metrics
is fixed for us and the models and competitors are ranked using it. In order to get higher leader board score
you need to get a better metric score. That's basically the only thing in the
competition that we need to care about, how to get a better score. And so it is very important to
understand how metric works and how to optimize it efficiently. I want to stress out that
it is really important to optimize exactly the metric we're given in
the competition and not any other metric. Consider an example, blue and red lines represent objects of
a class zero and one respectively. And say we decided to use
a linear classifier,and came up with two matrix to optimize, M1 and M2. The question is, how much different
the resulting classifiers would be? Actually by a lot. The two lines here, the solid and the dashed one show the best line
your boundaries for the two cases. For the dashed, M1 score is the highest
among all possible hyperplanes. But M2 score for the hyperplane is low. And we have an opposite situation for
the solid boundary. M2 score is the highest,
whereas M1 score is low. Now, if we know that in this particular
competition, the ranking is based on M1 score, then we need to optimize M1 score
and so we should submit the prediction. Predictions of the model
with dash boundary. Once again,
if your model is scored with some metric, you get best results by
optimizing exactly that metric. Now, the biggest problem is that some
metrics cannot be optimized efficiently. That is there is no simple enough way
to find, say, the optimal hyperplane. That is why sometimes we need to
train our model to optimize something different than competition metric. But in this case we will need to
apply various heuristics to improve competition metric score. And there's another case where we
need to be smart about the metrics. It is one that train and
the test sets are different. In the lesson about leaks,
we'll discuss leader board probing. That is, we can check, for example, if the mean target value on public part
of test set is the same as on train. If it's not, we would need to adapt our
predictions to suit rest set better. This is basically a specific metric
optimization technique we apply, because train and test are different. Or there can be more severe cases
where improved metric validation set could possibly not result into
improved metric on the test set. In these situations,
it's a good idea to stop and think maybe there is a different
way to approach the problem. In particular, time series can
be very challenging to forecast. Even if you did a validation just right. [INAUDIBLE] by time, rolling windows,
fill the distribution in the future can be much different
from what we had in the train set. Or sometimes,
there's just not enough training data, so a model cannot capture the patterns. In one of the compositions I took part, I had to use some tricks to boost
my score after the modeling. And the trick was as a consequence
of a particular metric used in that competition. The metric was quite unusual actually,
but it is intuitive. If a trend is guessed correctly, then the
absolute difference between the prediction and the target is considered as an error. If for instance, model predict end
value in the prediction horizon to be higher than the last value from the train
side but in reality it is lower, then the trend is predicted incorrectly,
and the error was set to
absolute difference squared. So if we predict a value to be
above the dashline, but it turns out to be below or vice versa, the trend
[INAUDIBLE] to be predicted incorrectly. So this metric carries a lot more about correct trend to be predicted than
about actual value you predict. And that is something it
was possible to exploit. There were several times
series was to forecast, the horizon to predict was wrong, and
the model's predictions were unreliable. Moreover, it was not possible
to optimize this metric exactly. So I realized that it would be much better
to set all the predictions to either last value plus a very tiny constant,
or last value minus very tiny constant. The same value for all the points in
the time interval, we are to predict for each time series. And design depends on the estimation. What is more likely the values
in the horizon to be lower than the last known value,
or to be higher? This trick actually took me to
the first place in that competition. So finding a nice way to optimize
a metric can give you an advantage over other participants,
especially if the metric is peculiar. So maybe I should formulate it like that. We should not forget to do kind of
exploratory metric analysis along with exploratory data analysis. At least when the metric
is an unusual one. So in this video we've understood
that each business has its own way to measure ineffectiveness
of an algorithm based on its needs, and therefore, there are so
many different metrics. And we saw two motivational examples. Why should we care about the metrics? Well, basically because it is how
competitors are compared to each other. In the following videos we'll
talk about concrete metrics. We'll first discuss high level
intuition for each metric and then talk about optimization techniques. [MUSIC]