[MUSIC] So, I've just promised you a lot of cool stuff that you can do with unsupervised learning. Now, let's cover how you do this, because otherwise would be a cheat. Now, as I've mentioned, there's many methods at play here. But let's start from the most simple to understand and the most, sort of, kind of, general one, the autoencoders. Autoencoders is the kind of models that encode the data in hidden representation and then decode it backwards. Now, this seems like a weird problem unless you want to compress the data, but trust me, they have a lot of surprises there. Now, again, autoencoders consist of two parts as the name suggests, those are encoder and decoder. If your data is denoted by x, then you can encode x, maybe images, cat images, into a hidden representation encoder of x. So that you can then decode it backwards with the decoder into the original representation. The mathematical objective here is again, weird. You want to compress image and decompress it backwards. So that the decompressed image is as lossless as possible. It kind of, it resembles the initial image in the sense of minimizing the pixel res MSE error, for example. It means squared error, to be accurate. Now this is immediately useful when you want to compress the data. But this representation that you learn is also very useful if you want to apply classification or regression methods on top of it. For example, you could take image raw pixels, and you probably know that when most cases, for example gradient boosting is useless when applied to raw pixels. But instead, you can feed it, not with the raw pixels, but with this hidden representation that you found without encoders. Well, this is all nice and good, but, in fact, you've already learned some kind of autoencoder if you've studied even the basic topics of machine learning. Because you've probably already know such things like a simple component analysis, or singular value decomposition or maybe non negative matrix factorization. In fact, those are all familiar to you if you use scikit-learn or caret. But the general idea behind all those methods is they take a large matrix, this is usually an object feature matrix of your dataset, your pixels go here. You chose a particular image, you try to represent this matrix as a product of two or more matrices. For example, it would be matrix that maps your data, your full roll, into some hidden representation. And a second matrix that maps your hidden representation back into the original pixel-wise representation. You try to learn a couple or more matrices depending on your method, to minimize some kind of reconstruction error. For a singular value decomposition, one way to do so is to minimize the mean squared error between your original matrix and the product of two kind of substitute matrices. Look at this matrix position thingy differently. One way to rewrite it is a process that first takes your data and kind of compresses it. Here's the encoder part, compresses it linearly to a hidden representation. And the second part then becomes the decoder, that takes your hidden representation and converts it backwards into pixels or whatever the form the data was in. To minimize the mean squared error between what was fed into the network and what emerged from it. Now, one initial way to expand this is we usually do with neural networks is to pretend that linear compression, linear decompression is somehow insufficient for us and make it nonlinear with address, of course. Now you think your encoder, instead of having a linear transformation, stick in a few dense layers or maybe other layers that you've learned about. Maybe with some dropout or whatever fancy names you remember. And then your autoencoder becomes nonlinear. And as we probably know or believe since the last two weeks, non-linear presentations can be more powerful in terms of they can learn more abstract features there. And the question is, imagine your data format is not just arbitrary set of features, but an image. So there's three channels, RGB, with, say, a 100 by 100 pixel grid. Is there maybe some particular architecture that you can use to compress the data and decompress it thereafter? So that your features are having some nice properties that are desirable for images, like being able to transfer the same feature one meter to the right and still have this feature recognized. Yes, right, one way to deal with it is to use the convolutional layers, or convolutional architecture in general. So, on the slide we have this super small one-layer, one convolution, one pooling architecture. But you could, of course, use a lot of stacked convolutions and poolings, or maybe some residual layers or inception models, whatever you prefer for a particular problem. The general idea is that anything that maps your input into hidden representation, and anything that maps it backward to the original presentation from the hidden one fits as the model of autoencoder. Provided it's differentiable, of course. So if it's that easy, you can even deal without dense layers at all. So you can take maybe convolutional encoder and then go straight to the convolutional decoder. This way your hidden representation is a small image-like format. [MUSIC]