Hi, everyone. The main topic of this video is Feature Interactions. You will learn how to construct them and use in problem solving. Additionally, we will discuss them for feature extraction from decision trees. Let's start with an example. Suppose that we are building a model to predict the best advertisement banner to display on a website. Among available features, there are two categorical ones that we will concentrate on. The category of the advertising banner itself and the category of the site the banner will be showing on. Certainly, we can use the features as two independent ones, but a really important feature is indeed the combination of them. We can explicitly construct the combination in order to incorporate our knowledge into a model. Let's construct new feature named ad_site that represents the combination. It will be categorical as the old ones, but set of its values will be all possible combinations of two original values. From a technical point of view, there are two ways to construct such interaction. Let's look at a simple example. Consider our first feature, f1, has values A or B. Another feature, f2, has values X or Y or Z, and our data set consist of four data points. The first approach is to concatenate the text values of f1 and f2, and use the result as a new categorical feature f_join. We can then apply the OneHot according to it. The second approach consist of two steps. Firstly, apply OneHot and connect to features f1 and f2. Secondly, construct new metrics by multiplying each column from f1 encoded metrics to each column from f2 encoded metrics. It was nothing that both methods results in practically the same new feature representations. In the above example, we can consider as interactions between categorical features, but similar ideas can be applied to real valued features. For example, having two real valued features f1 and f2, interactions between them can be obtained by multiplications of f1 and f2. In fact, we are not limited to use only multiply operation. Any function taking two arguments like sum, difference, or division is okay. The following transformations significantly enlarge feature space and makes learning easier, but keep in mind that it makes or frequent easier too. It should be emphasized that for three ways algorithms such as the random forest or gradient boost decision trees it's difficult to extract such kind of dependencies. That's why they're buffer transformation are very efficient for three based methods. Let's discuss practical details now. Where wise future generation approaches greatly increase the number of the features. If there were any original features, there will be n square. And will be even more features if several types of interaction are used. There are two ways to moderate this, either do feature selection or dimensionality reduction. I prefer doing the selection since not all but only a few interactions often achieve the same quality as all combinations of features. For each type of interaction, I construct all piecewise feature interactions. Feature random forests over them and select several most important features. Because number of resulting features for each type is relatively small. It's possible to join them together along with original features and use as input for any machine learning algorithm usually to be by use method. During the video, we have examined the method to construct second order interactions. But you can similarly produce throned order or higher. Due to the fact that number of features grow rapidly with order, it has become difficult to work with them. Therefore high order directions are often constructed semi-manually. And this is an art in some ways. Additionally, I would like to talk about methods to construct categorical features from decision trees. Take a look at the decision tree. Let's map each leaf into a binary feature. The index of the object's leaf can be used as a value for a new categorical feature. If we use not a single tree but an ensemble of them. For example, a random forest, then such operation can be applied to each of entries. This is a powerful way to extract high order interactions. This technique is quite simple to implement. Tree-based poodles from sklearn library have an apply method which takes as input feature metrics and rituals corresponding indices of leaves. In xgboost, also support to why a parameter breed leaf in predict method. I suggest we need to collaborate documentations in order to get more information about these methods and IPIs. In the end of this video, I will tackle the main points. We examined method to construct an interactions of categorical features. Also, we extend the approach to real-valued features. And we have learned how to use trees to extract high order interactions. Thank you for your attention.