[MUSIC] So, I've just promised you a lot
of cool stuff that you can do with unsupervised learning. Now, let's cover how you do this,
because otherwise would be a cheat. Now, as I've mentioned,
there's many methods at play here. But let's start from the most
simple to understand and the most, sort of, kind of, general one,
the autoencoders. Autoencoders is the kind of
models that encode the data in hidden representation and
then decode it backwards. Now, this seems like a weird problem
unless you want to compress the data, but trust me,
they have a lot of surprises there. Now, again, autoencoders consist
of two parts as the name suggests, those are encoder and decoder. If your data is denoted by x,
then you can encode x, maybe images, cat images,
into a hidden representation encoder of x. So that you can then decode it backwards
with the decoder into the original representation. The mathematical objective here is again,
weird. You want to compress image and
decompress it backwards. So that the decompressed image
is as lossless as possible. It kind of, it resembles
the initial image in the sense of minimizing the pixel res MSE error,
for example. It means squared error, to be accurate. Now this is immediately useful when
you want to compress the data. But this representation that you learn
is also very useful if you want to apply classification or
regression methods on top of it. For example, you could take image raw
pixels, and you probably know that when most cases, for example gradient boosting
is useless when applied to raw pixels. But instead, you can feed it,
not with the raw pixels, but with this hidden representation
that you found without encoders. Well, this is all nice and good, but, in
fact, you've already learned some kind of autoencoder if you've studied even
the basic topics of machine learning. Because you've probably already know such
things like a simple component analysis, or singular value decomposition or
maybe non negative matrix factorization. In fact, those are all familiar to
you if you use scikit-learn or caret. But the general idea behind all those
methods is they take a large matrix, this is usually an object feature matrix
of your dataset, your pixels go here. You chose a particular image, you try to represent this matrix as
a product of two or more matrices. For example, it would be matrix that
maps your data, your full roll, into some hidden representation. And a second matrix that maps your
hidden representation back into the original pixel-wise representation. You try to learn a couple or
more matrices depending on your method, to minimize some kind of
reconstruction error. For a singular value decomposition,
one way to do so is to minimize the mean squared error
between your original matrix and the product of two kind
of substitute matrices. Look at this matrix position
thingy differently. One way to rewrite it is a process
that first takes your data and kind of compresses it. Here's the encoder part, compresses it
linearly to a hidden representation. And the second part then
becomes the decoder, that takes your hidden representation and converts it backwards into pixels or
whatever the form the data was in. To minimize the mean squared error
between what was fed into the network and what emerged from it. Now, one initial way to expand this is
we usually do with neural networks is to pretend that linear compression, linear
decompression is somehow insufficient for us and make it nonlinear with address,
of course. Now you think your encoder,
instead of having a linear transformation, stick in a few dense layers or maybe
other layers that you've learned about. Maybe with some dropout or
whatever fancy names you remember. And then your autoencoder
becomes nonlinear. And as we probably know or
believe since the last two weeks, non-linear presentations can be more
powerful in terms of they can learn more abstract features there. And the question is,
imagine your data format is not just arbitrary set of features,
but an image. So there's three channels, RGB,
with, say, a 100 by 100 pixel grid. Is there maybe some particular
architecture that you can use to compress the data and decompress it thereafter? So that your features are having some
nice properties that are desirable for images, like being able to transfer the
same feature one meter to the right and still have this feature recognized. Yes, right, one way to deal with it is
to use the convolutional layers, or convolutional architecture in general. So, on the slide we have this super
small one-layer, one convolution, one pooling architecture. But you could, of course, use a lot of
stacked convolutions and poolings, or maybe some residual layers or
inception models, whatever you prefer for a particular problem. The general idea is that
anything that maps your input into hidden representation, and
anything that maps it backward to the original presentation from the hidden
one fits as the model of autoencoder. Provided it's differentiable, of course. So if it's that easy, you can even
deal without dense layers at all. So you can take maybe
convolutional encoder and then go straight to
the convolutional decoder. This way your hidden representation
is a small image-like format. [MUSIC]