[MUSIC] In this video, we will talk about tricks
that will make training of new neural networks much faster. And the first one is transfer learning. Remember that deep networks learn
complex features extractor but we need lots of data to
train it from scratch. Let's look at the ImageNet
classification architecture. There's lots of convolutional layers,
and we call it, features extractor. Because the last convolutional layer,
extracts features that are useful for classification with MOP,
which are the last pink layers. This architecture is trained
on ImageNet dataset. But what if we can reuse
an existing features extractor, that blue convolutions on the slide,
for a new task? How do we do that? We add a new classifier on
top of that features, and that orange weights is
all we need to train. You need less data to train, because
you train only the final MLP layer. It works if a domain of a new
task is similar to ImageNet's. It won't work for human emotions
classification, because ImageNet doesn't have people faces in the dataset, so it
doesn't know the concept of a human face. But what if we need to
classify human emotions? Maybe we can partially reuse
ImageNet features extractor. Let's look at the perfect
feature extractor that we need, it looks like a bunch of
convolutional layers, and let's look at what activation
stimuli we actually want to get. The first convolutional layer
will have highest activations for edge detectors that have
different rotation. If we go deeper than those convolutional
layers along the concept of a human eye, or nose, or throat. And if we go deeper than that, then we have layers that learn
the representation of a human face. That is our perfect feature
extractor that we want, let's compare it with
ImageNet feature extractor. ImageNet definitely has those
edge detectors as well, but, provided that it doesn't
have human faces in the data set, it doesn't know the concept of a nose or
throat, so we will need to train this. Let's look at how an architecture for human emotions classification
might look like. What we do is we actually reuse first
convolutional layers which are in green, and we add new convolutional layers and a new multi-layer perceptron to train for
our new task. And all we need to train
are those blue convolutions and orange full connected layers. It works much better because we have to
train much less number of parameters. What if we don't start from scratch and don't initialize those blue
convolutions with random numbers, but rather use initialization from
pre-trained ImageNet network? That leads us to so called,
fine-tuning technique, because you don't start with
the random initialization, but rather reuse those complex representations that
is suitable for ImageNet classification. What is the intuition behind this? We don't start with random features, but we start with features that are useful for
some other task. They are not perfect for our task, but
they might be much better than random. What we do next is we propagate all
the gradients, but with a smaller learning rate, so that we don't lose that
initialization that we got from ImageNet. Fine-tuning is very frequently used thanks
to wide spectrum of ImageNet classes. Keras, a deep learning framework,
has the weights of pre-trained VGG, Inception, and ResNet architectures. What is so special about that,
is that you can fine-tune a bunch of different architectures and
make an ensemble out of them. And you don't have to wait for two or three weeks to train your
network on ImageNet dataset. Let's summarize a little bit. If you have a small dataset and it is from
ImageNet domain, which means that you have objects that are somewhat similar to
those seen in the ImageNet data set, then all you need to do is to use transfer
learning and train last MLP layers. If you have a bigger data set, then it
makes sense to fine-tune deeper layers so that you squeeze a little bit more
of quality from this network. If you have a big dataset but
it's not similar to ImageNet domain, then it makes sense to train from
scratch because most likely, you can't reuse
the features from ImageNet. But if you have a small dataset and
which is not similar to ImageNet, then you'll not like it and most likely,
you will have to collect more data. In the next video, we will take a look
at other computer vision problems that utilize convolutional networks. [MUSIC]