Hello and welcome to our course. In this video, I want to give you a sense for what this course is about and I think the best way to do that is to talk about our course goals, our course assignments and our course schedule. So, at the broadest level, this course is about getting the required knowledge and expertise to successfully participate in data science competitions. That's the goal. Now, we're going to prepare this in a systematic way. We start in week one with a discussion of competitions, what are they, how they work, how they are different from real-life industrial data analysis. Then, we're moving to recap of main machine learning models. Besides this, we're going to review software and hardware requirements and common Python libraries for data analysis. After this is done, we'll go through various feature types, how we preprocess these features and generate new ones. Now, because we sometimes need to extract features from text and images, we will elaborate on most popular methods to do it. Finally, we will start working on the final project, the competition. But then we move on to week two. So, having figured out methods to work with data frames and models, we're starting to cover things you first do in a competition. And this is, by the way, a great opportunity to start working on the final project as we proceed through material. So, first in this week, we'll analyze data set in the exploratory data analysis topic or EDA for short. We'll discuss ways to build intuition about the data, explore anonymized features and clean the data set. Our main instrument here will be logic and visualizations. Okay, now, after making EDA, we switch to validation. And here, we'll spend some time talking about different validation strategies, identifying how data is split into train and test and about what problems we may encounter during validation and ways to address those problems. We finish this week with discussion of data leakage and leaderboard problem. We will define data leakage and understand what are leaks, how to discover various leaks and how to utilize them. So basically, this week, we set up the main pipeline for our final project. And at this point, you should have intuition about the data, reliable validation and data leaks explored. After this pipeline is ready, we'll focus on the improvement of our solution and that's already the week three. In that week, we'll analyze various metrics for regression and classification and figure out ways to optimize them both while training the model and afterwards. After we will check that we are correct in measure and improvements of our models, we'll define mean-encodings and work on the encoded features. So here, we start with categorical features, how mean-encoded features lead to overfitting and how we balance overfitting with regularization. Then, we'll discuss several extensions to this approach including applying mean-encodings to numeric features and time series, and this is the point where we move on to other advanced features in the week four. Basically, this include statistics and distance-based features, metrics factorizations, feature interactions and t-SNE. These features often are the key to superior performance in competition, so you should implement and optimize them here for the final project. After this, we'll get to hyperparameters optimization. Here, we will revise your knowledge about model tuning in a systematic way and let you apply to the competition. Then, we move onto the practical guide where all of us have summarized most important moments about competitions which became absolutely clear after few years of participation. These include both some general advice on how to choose and participate in the competition and some technical advice, how to set up your pipeline, what to do first and so on. Finally, we'll conclude this week by working on ensembles with KazAnova, the Kaggle top one. We'll start with simple linear ensemble, then we continue with bagging and boosting, and finally we'll cover stacking and stacked net approach. And here by the end of this week, you should already have all required knowledge to succeed in a competition. And then finally, we've got the last week. Here we will work to analyze some of our winning solutions in competitions. But all we are really doing in the last week is wrapping up the course, working on and submitting the final project. So, this basic structure of this course. Now, we move through those sections so that you can practice your skills in the course assignments and there are three basic types of assignments in this class: quizzes, programming assignments and the final project. You don't have to do all of these in order to pass the class, you only need to complete the required assignments and you can see which ones those are by looking on the course website. But let's go ahead and talk about the assignments. We begin with the competition. This is going to be the main assignment for you. In fact, we start working on it on the week two. There we do EDA, exploratory data analysis, set up main pipeline that you'll use for the rest of the course and check the competition for leakages. Then in week three we update our solution by optimizing given metric and adding mean-encoded features. After that, in the week four, we further improve our solution by working on advanced features, tune your hyperparameters and uniting models in ensemble. And in last week, we all are wrapping it up and producing solution by Kaggle winning model standards. We ask you to work on the project at your local machine or your server because Coursera computational resources are limited, and using them for the final project can slow down completing programming assignments for the fellow students. And, in fact, this class is mostly about this program and this competition assignment, but we also have quizzes and programming assignments for you. We include these to give you an opportunity to refine your knowledge about specific parts of this course: how to check data for leakages, how to implement mean encodings, how to produce an ensemble and so on. You can do them at Coursera site directly but you also can download these notebooks and complete them at your local computer or your server. And this basically is an overview of the course goals, course schedule and course assignments. So, let's go ahead and get started.