Hi. In this video, we will discuss basic visual generation approaches for datetime and coordinate features. They both differ significantly from numeric and categorical features. Because we can interpret the meaning of datetime and coordinates, we can came up with specific ideas about future generation which we'll discuss here. Now, let's start with datetime. Datetime is quite a distinct feature because it isn't relying on your nature, it also has several different tiers like year, day or week. Most new features generated from datetime can be divided into two categories. The first one, time moments in a period, and the second one, time passed since particular event. First one is very simple. We can add features like second, minute, hour, day in a week, in a month, on the year and so on and so forth. This is useful to capture repetitive patterns in the data. If we know about some non-common materials which influence the data, we can add them as well. For example, if we are to predict efficiency of medication, but patients receive pills one time every three days, we can consider this as a special time period. Okay now, time seems particular event. This event can be either row-independent or row-dependent. In the first case, we just calculate time passed from one general moment for all data. For example, from here to thousand. Here, all samples will become pairable between each other on one time scale. As the second variant of time since particular event, that date will depend on the sample we are calculating this for. For example, if we are to predict sales in a shop, like in the ROSSMANN's store sales competition. We can add the number of days passed since the last holiday, weekend or since the last sales campaign, or maybe the number of days left to these events. So, after adding these features, our dataframe can look like this. Date is obviously a date, and sales are the target of this task. While other columns are generated features. Week day feature indicates which day in the week is this, daynumber since year 2014 indicates how many days have passed since January 1st, 2014. is_holiday is a binary feature indicating whether this day is a holiday and days_ till_ holidays indicate how many days are left before the closest holiday. Sometimes we have several datetime columns in our data. The most for data here is to subtract one feature from another. Or perhaps subtract generated features, like once we have, we just have discussed. Time moment inside the period or time passed in zero dependent events. One simple example of third generation can be found in churn prediction task. Basically churn prediction is about estimating the likelihood that customers will churn. We may receive a valuable feature here by subtracting user registration date from the date of some action of his, like purchasing a product, or calling to the customer service. We can see how this works on this data dataframe. For every user, we know last_purchase_date and last_call_date. Here we add the difference between them as new feature named date_diff. For clarity, let's take a look at this figure. For every user, we have his last_purchase_date and his last_call_date. Thus, we can add date_diff feature which indicates number of days between these events. Note that after generation feature is from date time, you usually will get either numeric features like time passed since the year 2000, or categorical features like day of week. And these features now are need to be treated accordingly with necessary pre-processings we have discussed earlier. Now having discussed feature generation for datetime, let's move onto feature generation for coordinates. Let's imagine that we're trying to estimate the real estate price. Like in the Deloitte competition named Western Australia Rental Prices, or in the Sberbank Russian Housing Market competition. Generally, you can calculate distances to important points on the map. Keep this wonderful map. If you have additional data with infrastructural buildings, you can add as a feature distance to the nearest shop to the second by distance hospital, to the best school in the neighborhood and so on. If you do not have such data, you can extract interesting points on the map from your trained test data. For example, you can do a new map to squares, with a grid, and within each square, find the most expensive flat, and for every other object in this square, add the distance to that flat. Or you can organize your data points into clusters, and then use centers of clusters as such important points. Or again, another way. You can find some special areas, like the area with very old buildings and add distance to this one. Another major approach to use coordinates is to calculate aggregated statistics for objects surrounding area. This can include number of lets around this particular point, which can then be interpreted as areas or polarity. Or we can add mean realty price, which will indicate how expensive area around selected point is. Both distances and aggregate statistics are often useful in tasks with coordinates. One more trick you need to know about coordinates, that if you train decision trees from them, you can add slightly rotated coordinates is new features. And this will help a model make more precise selections on the map. It can be hard to know what exact rotation we should make, so we may want to add all rotations to 45 or 22.5 degrees. Let's look at the next example of a relative price prediction. Here the street is dividing an area in two parts. The high priced district above the street, and the low priced district below it. If the street is slightly rotated, trees will try to make a lot of space here. But if we will add new coordinates in which these two districts can be divided by a single split, this will hugely facilitate the rebuilding process. Great, we just summarize the most frequent methods used for future generation from datetime and coordinates. For datetime, these are applying periodicity, calculates in time passed since particular event, and engine differences between two datetime features. For coordinates, we should recall extracting interesting samples from trained test data, using places from additional data, calculating distances to centers of clusters, and adding aggregated statistics for surrounding area. Knowing how to effectively handle datetime and coordinates, as well as numeric and categorical features, will provide you reliable way to improve your score. And to help you devise that specific part of solution which is often required to beat very top scores. [SOUND]