Now, I'll give you a few tips that are going to help you to make a good ensemble, or at least it help you to start with. And I mentioned before when I talked about stacking, what's quite important is to introduce diversity, and one way to do this is based on the algorithms you use. So an architecture that I have found quite useful is to always make two or three gradient boosted trees. And what I'm about to do here is I haven't either tried different implementations, or I try to make one with bigger depth, one with middle depth, and one with low depth, and then I try to tune the parameters around it in order to make them have as similar performance as possible. Again, hoping that the end result will be some models which are fairly diverse to each other. Another thing that I like is, I'm using neural nets either from Keras or PyTorch. And again, what I try to do is, I try to make one which is fairly deep that take three hidden layers, and one which is middle, like two hidden layers, and one that has, let's say, only one hidden layer. Again, I try to diversify, I try to make them slightly different, in order to try and get new information. Then I use a few ExtraTrees or Random Forest, most of the time they work quite well, they normally add. I also add a few linear models or ridge regression. I also like linear support vector machine from scikit-learn. KNN models tend to want quite nice value in many problems, which is surprising because if you look at them individually, they rarely have have good performance by compared to an exit boost that they generally tend to add quite some value in a metamodeling context. I personally like factorization machines, I find them quite useful, especially libFM that factorizes all pairwise interactions. And if your data permits, so if your data is not so big, I also find useful support vector machines with some sort of non-linear kernel, like RBF. They don't work quite well, especially in regression. The other way you can introduce diversity is with the transformations you make in your own input data, whereas, you can actually run exactly the same models just by having slightly different input data, is enough to generate diversity. So, what we look at for categorical features are train one hot encoding, label encoding as in replacing the categorical values with an index. I can use likelihood encoding or target encoding, or I can replace categories with frequency or counts. For numerical features, I try to take care of outliers or not. I bin the variables from X to Z, from Z to all, and so on. I use derivatives which is a way to smoothen your variables or I could use percentiles or scaling, these are different ways I use to change my numerical features in different models. And then I also explore interactions, like column one, multiple column two, or column one plus column two. I can go up to three or four levels so explore all possible interactions. Another way to explore interactions is with groupby statements where you say, given all categories of a categorical feature, compute the average of another variable, for example. In certain situations, this works quite well. The other thing that I would do is, I would try and supervise techniques like k-means, or SVM, or PCA, or numerical features. Again, it tends to add value quite often. Now, on every other layer, what you the need to keep in mind is that you need to make your algorithms smaller or shallower, more constrained. What does this mean? In gradient boosting trees, means you need to put very small depth, like two or three. Linear models, you need to put high regularization. Extra Trees, just don't make them too big, they tend to work quite well. Shallow neural networks, again, I normally put one layer maximum two layers here, with not that many hidden neurons. We can try KNN with BrayCurtis distance or sometimes it's actually better to brute force the best linear weights we could use. Actually, in a cross-validation try to find the best linear weights, again, symmetric. We can also deploy different feature engineering in this subsequent level. One thing we could do is, we could create pairwise differences between the model's predictions, and this can help because they tend to be quite correlated. So when you create the differences, you essentially force the model to focus on what each new model brings. And you can also create row-wise statistics like averages or standard deviations of all the models. This is almost like an ensemble, you create some ensemble features yourself. You can also deploy standard feature selection techniques, so any techniques you would use to find which features are important. You could use it here to find which models are important and exclude them. A rule which empirically, let's say as a rule of thumb, I've found to work quite well. I mean, not with 100 percent confidence but as a general idea, is that for every seven point five models in one layer, we add one model in the subsequent layer. So if we have seven models, we'll have one metamodel. If we have 15 models, then we will have two metamodels and so on. What we need to be very mindful is that, even though we use this hold-out mechanism, we still might introduce leakage. How we can control this is by selecting the right K, when I mentioned K in the cross-validation. So when we select a very high value there, this means that each model would use more training data when it makes predictions and therefore it might not generalize very well. At the same time, it will exhaust all information about the training data. So again, there is this bias variance threshold where you try to find. So, there is no easy way to spot a mistake here. Normally, you have a test data set, and if you see in your cross-validation that you have a next improvement, that in your test data you don't see it, then you need to go back and try to reduce the number of K-Folds. And hopefully, this will generalize better, at least that's a way that has worked in practice. I'll list a few software which you can use for stacking, one is StackNet which is the product of my research. You can give it a shot if you want. Another thing which you can try, is Stacked ensembles from H2O. There is also a new software called, Xcessic and Python where you can also create quite diverse, and large ensembles. A few more things to know about StackNet if you want to use it, is that it now supports many of the common machine learning tools we use, like xgboost, lightgbm, H2O. So, you can pretty much have available all the great tools in order to build a strong ensemble. An interesting addition which we didn't have much chance to discuss but nevertheless, it is quite interesting, is that you can run classifiers in a regression problem, and vice versa. This means that instead of predicting AIDS, I could be predicting, will this person live more than 50 years old or not. And this tends to work quite well because it makes the model focus in certain areas, and meta-model will be able to utilize this information to make better predictions for AIDS. So, this tends to works quite well. I've found it useful quite many often. So, this is something you should explore. Generally, the software has already many top 10 solutions, and not just by me. So, it has been tested. And in the examples section, I think there is a really interesting example with a very popular kind of competition which was hosted by Amazon. And this example uses StackNet, and you can see how you can get with top 10. But in principle, this is a very nice competition. It doesn't have very big data, nor very small. You can try lots of transformations especially with the categorical data. And it is very good place to start. The other thing I wanted to say is that StackNet has also an educational flavor. I made it as such, I have this focus in mind. So, if you go to the parameter section where it leaves all the different tools, you can find sections where they tell you which parameters are the most important. This is based on my experience. For example, in xgboost, num_round is important, eta is important. You can take, then, this information and use it even outside StackNet. For example, if you want to use this, they're from Python. Because the parameters are generally the same. So, if you don't know where to start, how to look here, see which these parameters are important. And focus on them in order to try and get a good resolution. Before we close this session, there are a few things I'd like to tell you. Go out there, apply what you've learned. There is no such thing as learning only theoretically. The best thing to learn is to bleed on the battlefield. Choose a competition. You may start with some which are named as tutorials. And then, you can go on into the real competitions. This is how you've learned. You need to get more practical experience, obviously. Don't be demoralized if you see there's still a gap with the top people, because it does take some time to adjust. You need to learn the dynamics. You need to understand how you need to work, where to optimize your work, how maximize the intensity. So, it takes a bit of time. But you'll get there. My main point is don't get disappointed, you'll definitely get there. Something that has always helped me is to save my code. Let's say at the end of it it's a silence. That is good. Because I can then take home this code in the next competition, and try and improve it. So, this helps to gradually build a much stronger pipeline, and at the same time, it saves time. Something that has helped me personally is to seek collaborations. These generally, I think, there are two elements into it. One is, you definitely improve your result, because every person seizes the problem from different angles, is able to extract different information. Therefore, when you join forces the score is better, but it's also more fun. And since you're doing it, you might as well enjoy it. And the other thing I need to highlight is generally, you need to be connected with forums, and codes, and kernels, because there might be tips, there might be some cutting-edge solutions that come out, and they can significantly sift a leader-board. So generally, you need to keep reading, and have that in mind. This is the last video in a series where we have examined and sample methods will look that simple and sampling. Then we went to bagging, boosting, stacking, multi-layer stacking. Hopefully, you found this useful. Thank you for bearing with me all this time. I also enjoyed it. And go out there, make us proud and who knows? The next two person in the leader-board might be you.