Hello everyone. This is Marios Michailidis and we will continue our discussion in regards to ensemble methods. Previously, we saw some simple averaging methods. This time, we'll discuss about bagging, which is a very popular and efficient form of ensembling. What is bagging? bagging refers to averaging slightly different versions of the same model as a means to improve the predictive power. A common and quite successful application of bagging is the Random Forest. Where you would run many different versions of decision trees in order to get a better prediction. Why should we consider bagging? Generally, in the modeling process, there are two main sources of error. There are errors due to bias often referred to as underfitting, and errors due to variance often referred to as overfitting. In order to better understand this, I'll give you two opposite examples. One with high bias and low variance and vice versa in order to understand the concept better. Let's take an example of high bias and low variance. We have a person who is let's say young, less than 30 years old and we know this person is quite rich and we're trying to find him, this person who'll buy a racing or an expensive car. Our model has high variance, has high bias if it says that this person is young and I think he's not going to buy an expensive car. What the model has done here is that it hasn't explore very deep relationship within the data. It doesn't matter that this person is young if it has a lots of money when it comes to buying a car. It hasn't explored different relationships. In other words, it has been underfitted. However, this is also associated with low variance because this relationship, the fact that a young person generally doesn't buy an expensive car is generally true so we would expect this information to generalize well enough in a foreseen data. Therefore, the variance is low in this example. Now, let's try to see the other way around, an example with high variance and low bias. Let's assume we have a person. His name is John. He lives in a green house, has brown eyes, and we want to see he will buy a car. A model that has gone so deep in order to find these relationships actually has a low bias because it has really explored a lots of information about the training data. However, it is making the mistake that every person that has these characteristics is going to buy a car. Therefore, it generalizes for something that it shouldn't. In other words, it has already exhausted the information in the training data and the results are not significant. So, here, we actually have high variance but low bias. If we were to visualize the relationship between prediction error and model complexity, it would look like that. When we begin the training of the model, we can see that the training error make the error in that training data gets reduced and the same happens in the test data because the predictions are easily generalizable. They are simple. However, after a point, any improvements in the training error are not realized into test data. This is the point where the model starts over exhausting information, creates predictions that are not generalizable. This is where bagging actually comes into play and offers it's utmost value. By making slightly different or let say randomized models, we ensure that the predictions do not read very high variance. They're generally more generalizable. We don't over exhaust the information in the training data. At the same time, we saw before that when you average slightly different models, we are generally able to get better predictions and we can assume that in 10 models, we are still able to find quite significant information about the training data. Therefore, this is why bagging tends to work quite well and personally, I always use bagging. When I say, "I fit a model," I have actually not fit a model I have fit a bagging version of this model so probably that different models. Which parameters are associated with bagging? The first is the seed. We can understand that many algorithms have some randomized procedures so by changing the seed you ensure that they are made slightly differently. At the same time, you can run a model with less rows or you could use bootstrapping. Bootstrapping is different from row sub-sampling in the sense that you create an artificial dataset so you might let's say data row the training data three or four times. You create a random dataset from the training data. A different form of randomness can be imputed with shuffling. There are some algorithms, which are sensitive to the order of the data. By changing the order you ensure that the models become quite different. Another way is to dating a random sample of columns so bid models on different features or different variables of the data. Then you have model-specific parameters. For example, in a linear model, you will try to build 10 different let's say logistic regression with slightly different regularization parameters. Obviously, you could also control the number of models you include in your ensemble or in this case we call them bags. Normally, we put a value more than 10 here but, in principle, the more bags you put, it doesn't hurt you. It makes results better but after some point, performance start plateauing. So there is a cost benefit with time but, in principle, more bags is generally better and optionally, you can also apply parallelism. Bagging models are independent to each other, which means you can build many of them at the same time and make full use of your computation power. Now, we can see an example about bagging but before I do that, just to let you know that a bagging estimators that scikit-learn has in Python are actually quite cool. Therefore, I recommend them. This is a typical 15 lines of code that I use quite often. They seem really simple but they're actually quite efficient. Assuming you have a training at the test dataset and to target variable, what you do is you specify some bagging parameters. What is the model I'm going to use at random forest? How many bags I'm going to run? 10. What will be my seed? One. Then you create an object, an empty object that will save the predictions and then you run a loop for as many bags as you have specified. In this loop, you repeat the same. You change the seed, you feed the model, you make predictions in the test data and you save these predictions and then, you just take an average of these predictions. This is the end of the session. In this session, we discussed bagging as a popular form of ensembling. We saw bagging in association with variants and bias and we also saw in the example about how to use it. Thank you very much. The next session we will describe boosting, which is also very popular so stay in tune and have a good day.