[MUSIC] In this video, we will talk about tricks that will make training of new neural networks much faster. And the first one is transfer learning. Remember that deep networks learn complex features extractor but we need lots of data to train it from scratch. Let's look at the ImageNet classification architecture. There's lots of convolutional layers, and we call it, features extractor. Because the last convolutional layer, extracts features that are useful for classification with MOP, which are the last pink layers. This architecture is trained on ImageNet dataset. But what if we can reuse an existing features extractor, that blue convolutions on the slide, for a new task? How do we do that? We add a new classifier on top of that features, and that orange weights is all we need to train. You need less data to train, because you train only the final MLP layer. It works if a domain of a new task is similar to ImageNet's. It won't work for human emotions classification, because ImageNet doesn't have people faces in the dataset, so it doesn't know the concept of a human face. But what if we need to classify human emotions? Maybe we can partially reuse ImageNet features extractor. Let's look at the perfect feature extractor that we need, it looks like a bunch of convolutional layers, and let's look at what activation stimuli we actually want to get. The first convolutional layer will have highest activations for edge detectors that have different rotation. If we go deeper than those convolutional layers along the concept of a human eye, or nose, or throat. And if we go deeper than that, then we have layers that learn the representation of a human face. That is our perfect feature extractor that we want, let's compare it with ImageNet feature extractor. ImageNet definitely has those edge detectors as well, but, provided that it doesn't have human faces in the data set, it doesn't know the concept of a nose or throat, so we will need to train this. Let's look at how an architecture for human emotions classification might look like. What we do is we actually reuse first convolutional layers which are in green, and we add new convolutional layers and a new multi-layer perceptron to train for our new task. And all we need to train are those blue convolutions and orange full connected layers. It works much better because we have to train much less number of parameters. What if we don't start from scratch and don't initialize those blue convolutions with random numbers, but rather use initialization from pre-trained ImageNet network? That leads us to so called, fine-tuning technique, because you don't start with the random initialization, but rather reuse those complex representations that is suitable for ImageNet classification. What is the intuition behind this? We don't start with random features, but we start with features that are useful for some other task. They are not perfect for our task, but they might be much better than random. What we do next is we propagate all the gradients, but with a smaller learning rate, so that we don't lose that initialization that we got from ImageNet. Fine-tuning is very frequently used thanks to wide spectrum of ImageNet classes. Keras, a deep learning framework, has the weights of pre-trained VGG, Inception, and ResNet architectures. What is so special about that, is that you can fine-tune a bunch of different architectures and make an ensemble out of them. And you don't have to wait for two or three weeks to train your network on ImageNet dataset. Let's summarize a little bit. If you have a small dataset and it is from ImageNet domain, which means that you have objects that are somewhat similar to those seen in the ImageNet data set, then all you need to do is to use transfer learning and train last MLP layers. If you have a bigger data set, then it makes sense to fine-tune deeper layers so that you squeeze a little bit more of quality from this network. If you have a big dataset but it's not similar to ImageNet domain, then it makes sense to train from scratch because most likely, you can't reuse the features from ImageNet. But if you have a small dataset and which is not similar to ImageNet, then you'll not like it and most likely, you will have to collect more data. In the next video, we will take a look at other computer vision problems that utilize convolutional networks. [MUSIC]