In this video, we will discuss what is the loss and what is a metric, and what is the difference between them. And then we'll overview what are the general approaches to metric optimization. Let's start with a comparison between two notions, loss and metric. The metric or target metric is a function which we want to use to evaluate the quality of our model. For example, for a classification task, we may want to maximize accuracy of our predictions, how frequently the model outputs the correct label. But the problem is that no one really knows how to optimize accuracy efficiently. Instead, people come up with the proxy loss functions. They are such evaluation functions that are easy to optimize for a given model. For example, logarithmic loss is widely used as an optimization loss, while the accuracy score is how the solution is eventually evaluated. So, once again, the loss function is a function that our model optimizes and uses to evaluate the solution, and the target metric is how we want the solution to be evaluated. This is kind of expectation versus reality thing. Sometimes we are lucky and the model can optimize our target metric directly. For example, for mean square error metric, most libraries can optimize it from the outset, from the box. So the loss function is the same as the target metric. And sometimes we want to optimize metrics that are really hard or even impossible to optimize directly. In this case, we usually set the model to optimize a loss that is different to a target metric, but after a model is trained, we use hacks and heuristics to negate the discrepancy and adjust the model to better fit the target metric. We will see the examples for both cases in the following videos. And the last thing to mention is that loss metric, cost objective and other notions are more or less used as synonyms. It is completely okay to say target loss and optimization metric, but we will fix the wording for the clarity now. Okay, so far, we've understood why it's important to optimize a metric given in a competition. And we have discussed the difference between optimization loss and target metric. Now, let's overview the approaches to target metrics optimization in general. The approaches can be broadly divided into several categories, depending on the metric we need to optimize. Some metrics can be optimized directly. That is, we should just find a model that optimizes this metric and run it. In fact, all we need to do is to set the model's loss function to these metric. The most common metrics like MSE, Logloss are implemented as loss functions in almost every library. For some of the metrics that cannot be optimized directly, we can somehow pre-process the train set and use a model with a metric or loss function which is easy to optimize. For example, while MSPE metric cannot be optimized directly with XGBoost, we will see later that we can resample the train set and optimize MSE loss instead, which XGBoost can optimize. Sometimes, we'll optimize incorrect metric, but we'll post-process the predictions to fit classification, to fit the communication metric better. For some models and frameworks, it's possible to define a custom loss function, and sometimes it's possible to implement a loss function which will serve as a nice proxy for the desired metric. For example, it can be done for quadratic-weighted Kappa, as we will see later. It's actually quite easy to define a custom loss function for XGBoost. We only need to implement a single function that takes predictions and the target values and computes first and second-order derivatives of the loss function with respect to the model's predictions. For example, here you see one for the Logloss. Of course, the loss function should be smooth enough and have well-behaved derivatives, otherwise XGBoost will drive crazy. In this course, we consider only a small set of metrics, but there are plenty of them in fact. And for some of them, it is really hard to come up with a neat optimization procedure or write a custom loss function. Thankfully, there is a method that always works. It is called early stopping, and it is very simple. You set a model to optimize any loss function it can optimize and you monitor the desired metric on a validation set. And you stop the training when the model starts to fit according to the desired metric and not according to the metric the model is truly optimizing. That is important. Of course, some metrics cannot be even easily evaluated. For example, if the metric is based on a human assessor's opinions, you cannot evaluate it on every iteration. For such metrics, we cannot use early stopping, but we will never find such metrics in a competition. So, in this video, we have discussed the discrepancy between our target metric and the loss function that our model optimizes. We've reviewed several approaches to target metric optimization and, in particular, discussed early stopping. In the following videos, we will go through the regression and classification metrics and see the hacks we can use to optimize them.