In this video, I want to talk about complexity of real world machine learning pipelines and how they differ from data science competitions. Also, we will discuss the philosophy of the competitions. Real world machine learning problems are very complicated. They include several stages, each of them is very important and require attention. Let's imagine that we need to build an an anti-spam system and consider the basic steps that arise when building such a system. First of all, before doing any machine learning stuff, you need to understand the problem from a business point of view. What do you want to do? For what? How can it help your users? Next, you need to formalize the task. What is the definition of spam? What exactly is to be predicted? The next step is to collect data. You should ask yourself, what data can we use? How to mine examples of spam and non-spam? Next, you need to take care of how to clean your data and pre-process it. After that, you need to move on to building models. To do this, you need to answer the questions, which class of model is appropriate for this particular task? How to measure performance? How to select the best model? The next steps are to check the effectiveness on the model in real scenario, to make sure that it works as expected and there was no bias introduced by learning process. Does the model actually block spam? How often does it block non-spam emails? If everything is fine, then the next step is to deploy the model. Or in other words, make it available to users. However, the process doesn't end here. Your need to monitor the model performance and re-train it on new data. In addition, you need to periodically revise your understanding of the problem and go for the cycle again and again. In contrast, in competitions we have a much simpler situation. All things about formalization and evaluation are already done. All data collected and target metrics fixed. Therefore your mainly focus on pre-processing the data, picking models and selecting the best ones. But, sometimes you need to understand the business problem in order to get insights or generate a new feature. Also sometimes organizers allow the usage of external data. In such cases, data collection become a crucial part of the solution. I want to show you the difference between real life applications and competitions more thoroughly. This table shows that competitions are much simpler than real world machine learning problems. The hardest part, problem formalization and choice of target metric, is already done. Also questions related to deploying out of scope, so participants can focus just on modeling part. One may notice that in this table data collection and model complexity roles have no and yes in competition column. The reason for that, that in some competitions you need to take care of these things. But usually it's not the case. I want to emphasize that as competitors, the only thing we should take care about is target metrics value. Speed, complexity and memory consumption, all this doesn't matter as long as you're able to calculate it and re-produce your own results. Let's highlight key points. Real world machine learning pipelines are very complicated and consist of many stages. Competitions, add weight to a lot of things about modeling and data analysis, but in general they don't address the questions of formalization, deployment and testing. Now, I want to say a few words about philosophy on competitions, in order to form a right impression. We'll cover these ideas in more details later in the course along with examples. The first thing I want to show you is that, machine learning competitions are not only about algorithms. An algorithm is just a tool. Anybody can easily use it. You need something more to win. Insights about data are usually much more useful than a returned ensemble. Some competitions could be solved analytically, without any sophisticated machine learning techniques. In this course, we will show you the importance of understanding your data, tools to use and features you tried to exploit in order to produce the best solution. The next thing I want to say, don't limit yourself. Keep in mind that the only thing you should care about is target metric. It's totally fine to use heuristics or manual data analysis in order to construct golden feature and improve your model. Besides, don't be afraid of using complex solutions, advance feature engineering or doing the huge gritty calculation overnights. Use all the ways you can find in order to improve your model. After passing this course, you will able to get the maximum gain from your data. And now the important aspect is creativity. You need to know traditional approaches of solid machine learning problems but, you shouldn't be bounded by them. It's okay to modify or hack existing algorithm for your particular task. Don't be afraid to read source codes and change them, especially for deploying stuff. In our course, we'll show you examples of how a little bit of creativity can lead to constructing golden features or entire approaches for solving problems. In the end, I want to say enjoy competitions. Don't be obsessed with getting money. Experience and fun you get are much more valuables than the price. Also, networking is another great advantage of participating in data science competition. I hope you find this course interesting.