[MUSIC] Welcome back. We just studied about how Deep Learning
works and how to train your [INAUDIBLE]. And with some luck, you've already made
it through the practical assignments. So you basically know that deep learning
can help you when one of your models just don't cut it. And you probably hope that
the process will repeat itself along other [INAUDIBLE]. To a large extent that this is true,
but today, let's talk about where this isn't and
where deep learning is not. You know these things,
some of these things. Now, let's talk about
some of the limitations. Now, what deep learning is, the one thing deep learning is not
is deep learning is not magic. It won't just solve all the problems for
you. It won't be this silver bullet that you
can just unpack from [INAUDIBLE] and hope that it gets much better than
anything you tried previously for years. This is what a lot of people expect
from neural network, but please don't, because it won't solve your problems for
free. Instead, deep learning is
just a practical field. It has its strengths,
we'll talk about them in the second part. But it also has its weak points. And for the one thing, deep learning lacks
this core theoretical understanding. It sounds like a lame accusation when you
talk about the practical [INAUDIBLE] or absence if a theory isn't obviously
preventing it from working. But the problem here is that when
you try to build an architecture, develop something new for a model,
an absence of a theoretical kernel that'll be able to explain stuff for you actually
makes you do a lot more experimentation. It's the fact deep learning only offers
you some [INAUDIBLE] like this works, this idea kind of applies everywhere
where you have this situation and so on. But those intuitive kind of rules,
they are not 100% accurate, and this is a problem if you
want to develop something new. Not a problem is that turn complex
dependencies, neural networks and deep learning models and
journal have a lot of parameter. This not only means that they can capture
complex dependencies in the data but they can also over feed tremendously. This means that for any problem,
if you neural network, you generally need much larger dataset to train on, then you
would if linear models or decision trees. Whenever you end up in some new area
which is not some image classification or text processing, sometimes you'll
find out that for practical reasons, it's better to use decision trees or
even linear models. Now, finally, deep learning
models are computationally heavy. And whenever you want your machine
learning to run super fast or require as little memory as possible. So if you're running on smart phones or
embedded systems, you'll generally have to do some, again, dark magic to make your
neural network run as fast as you require. This isn't true for C-linear models,
that apply almost instantaneously. There's one more disadvantage. It's kind of well,
it's hard to fix a disadvantage. It has some strong points. But the deep learning is
pathologically overhyped. But basically, machine learning,
the super domain of deep learning, is overhyped as well. But deep learning is the most kind of
advertised, the most hot topic within the most hot area of the mathematics,
which is the machine learning. This is good because deep learning
attracts a lot of talented researches and talented practitioners. But the problem is that since it's so
hard, a lot of people expect wonders from it. So sometimes you'll find yourself,
if you're trying to apply deep learning in business, find yourself in a company of
people who don't understand deep learning. They believe that it's some super
artificial intelligence big data blah blah, yada yada yada. It will get you top one
position in the business and solve all your problems for you. So not only you should not expect deep
learning to make wonders yourself, because wonders, as you know,
require all the hard work. You also have to fight with other
people who expect otherwise. Now all those arguments draw a rather
grim picture what deep learning is, but there are a lot of positive
sides of it as well. For one, you can think of deep learning as
this kind of machine learning language, like any language. A language is a tool to express something. A national language is a tool which you
can express at least to all the humans. And a programming language means to
express what you want your computer to do in a way that the computer can execute it. And deep learning, in turn,
is a language that allows you to hint your machine learning model about
what you want this model to learn. Hint is about what kind of
features you want it to have, and what kind of expert knowledge
can be applied for on this dataset. Let's draw a few examples
to prove this point. Let's say you have a usual
classification problem. You have two sets of features. You have the raw low-level features,
and the high-level features. And you want to predict some
kind of target given this. This whole thing is being to
sound a little abstract, so let's get to a concrete scenario. Say you want to make a regression on
the price of a car, a second hand car, to be accurate. Have a, well, a photo of a car, and
some high level features like a brand, some model, maybe a production dates,
and some blemishes and enhancement installed in this car. What you want to do is you
want to build a model that uses both of those feature types. And the simplest way to do so
is to just concatenate them and feed them into a whole [INAUDIBLE] using
neural network in your model, whatever. Of course you can do that, but the problem with this is
approach is kind of inefficient. Now if we speak about neural networks,
the resulting model would like this, for example. And the problem with this model, the main
one, is that the first dense layer here, it tries to combine two worlds,
two domains of features, and it tries to combine them linearly. So, what it does is it takes the age
measured in years or months. And it's multiplied by some coefficient,
and adds up with a pixel intensity. It's technically possible. I mean no one will punish you from doing
so unless there's a physicist nearby. But it's kind of [INAUDIBLE] and it made
practical application this architecture tends to work worse than
it otherwise could. What you can do is you can save
the following thing of this language. You can see that you want to view
the representation for those raw features, which is as complex as
those high-level ones. The way you can express this is by,
well, taking more layers. Now, basically you have
two branches of data. And for some amount of time,
you persist them independently. You have those raw features and
you apply dense layers, stacked maybe two, three dense layers, that only extract
features from raw image pixels. And only then, once you've got those
features like a presence of a blemish, or maybe a crack on the front glass,
or anything, only then you combine those features
with the high-level features you've got. Now, it makes slightly more sense,
although it's not the perfect model. Generalists taking more layers to extract
features is also in the more abstract kind of features. And if you stack enough layers, you'll eventually get features
that are easy to combine there. So let's now consider a similar
although a slightly different problem. This time,
we're still solving car price regression. But we want to also infuse
another [INAUDIBLE]. They say that through some kind of
external information that we've got, we don't want our network to trust
the image data too enthusiastically. For example, I might be unwilling
to trust the car dealers that much. Let's say that some of their images
have shown to be too optimistic, and showing a car in a condition
better than the actual one. By default,
our network does the exact opposite. It trusts the images too much because
there is say 10,000 image pixels, 100 by 100% pixels. And there's only say, 100 attributes
that are high-level features. So we want to do the opposite. You can of course achieve this by means
of applying usual machine learning. Simply over-glorizing the raw features,
the pixels, or maybe you're [INAUDIBLE] here. But in deep learning, you can also
do this by means of architecture. In this case, we have introduced
the thing called the bottle neck layer. This one layer, this one with 32 units, which is much smaller
than any other layer. And it's the bottle neck, so
any information that neural network takes from the image,
it should go through this layer. It kind of limits the amount of useful
features your model can get, and biases toward trusting
raw image features less. This is of course not guaranteed. So, technically,
if you feed your model for too long, it might just encode everything in this
super-complex non-linear dependency and still get all the information through. But it's one way you can
approach this problem. [MUSIC]