1
00:00:00,000 --> 00:00:02,400
To build a Neural Style Transfer system,

2
00:00:02,400 --> 00:00:05,730
let's define a cost function for the generated image.

3
00:00:05,730 --> 00:00:09,953
What you see later is that by minimizing this cost function,

4
00:00:09,953 --> 00:00:12,270
you can generate the image that you want.

5
00:00:12,270 --> 00:00:15,231
Remember what the problem formulation is.

6
00:00:15,231 --> 00:00:17,400
You're given a content image C,

7
00:00:17,400 --> 00:00:21,510
given a style image S and you goal is to generate a new image

8
00:00:21,510 --> 00:00:26,180
G. In order to implement neural style transfer,

9
00:00:26,180 --> 00:00:34,080
what you're going to do is define a cost function J of G that measures how good is

10
00:00:34,080 --> 00:00:37,920
a particular generated image and we'll use gradient to

11
00:00:37,920 --> 00:00:42,425
descent to minimize J of G in order to generate this image.

12
00:00:42,425 --> 00:00:44,490
How good is a particular image?

13
00:00:44,490 --> 00:00:48,460
Well, we're going to define two parts to this cost function.

14
00:00:48,460 --> 00:00:52,190
The first part is called the content cost.

15
00:00:52,190 --> 00:00:56,480
This is a function of the content image and of the generated image and

16
00:00:56,480 --> 00:01:00,960
what it does is it measures how similar is the contents of the generated image

17
00:01:00,960 --> 00:01:05,495
to the content of the content image C. And then going to

18
00:01:05,495 --> 00:01:10,345
add that to a style cost function which is now a function of

19
00:01:10,345 --> 00:01:14,720
S,G and what this does is it measures how similar is

20
00:01:14,720 --> 00:01:20,547
the style of the image G to the style of the image S. Finally,

21
00:01:20,547 --> 00:01:24,180
we'll weight these with two hyper parameters alpha and beta to

22
00:01:24,180 --> 00:01:29,610
specify the relative weighting between the content costs and the style cost.

23
00:01:29,610 --> 00:01:33,405
It seems redundant to use two different hyper parameters to specify

24
00:01:33,405 --> 00:01:44,370
the relative cost of the weighting.

25
00:01:44,370 --> 00:01:47,070
One hyper parameter seems like it would be enough but

26
00:01:47,070 --> 00:01:50,230
the original authors of the Neural Style Transfer Algorithm,

27
00:01:50,230 --> 00:01:52,410
use two different hyper parameters.

28
00:01:52,410 --> 00:01:55,278
I'm just going to follow their convention here.

29
00:01:55,278 --> 00:01:57,925
The Neural Style Transfer Algorithm I'm

30
00:01:57,925 --> 00:02:00,810
going to present in the next few videos is due to Leon Gatys,

31
00:02:00,810 --> 00:02:04,200
Alexander Ecker and Matthias.

32
00:02:04,200 --> 00:02:09,630
Their papers is not too hard to read so after watching these few videos if you wish,

33
00:02:09,630 --> 00:02:14,550
I certainly encourage you to take a look at their paper as well if you want.

34
00:02:14,550 --> 00:02:17,910
The way the algorithm would run is as follows,

35
00:02:17,910 --> 00:02:21,300
having to find the cost function J of G in

36
00:02:21,300 --> 00:02:25,030
order to actually generate a new image what you do is the following.

37
00:02:25,030 --> 00:02:29,035
You would initialize the generated image

38
00:02:29,035 --> 00:02:30,720
G randomly so it might be 100 by 100 by 3 or 500 by 500 by

39
00:02:30,720 --> 00:02:37,200
3 or whatever dimension you want it to be.

40
00:02:37,200 --> 00:02:41,885
Then we'll define the cost function J of G on the previous slide.

41
00:02:41,885 --> 00:02:47,805
What you can do is use gradient descent to minimize this so you can update G as

42
00:02:47,805 --> 00:02:54,300
G minus the derivative respect to the cost function of J of G. In this process,

43
00:02:54,300 --> 00:02:58,770
you're actually updating the pixel values of this image G which is

44
00:02:58,770 --> 00:03:04,175
a 100 by 100 by 3 maybe rgb channel image.

45
00:03:04,175 --> 00:03:10,215
Here's an example, let's say you start with this content image and this style image.

46
00:03:10,215 --> 00:03:13,365
This is a another probably Picasso image.

47
00:03:13,365 --> 00:03:15,855
Then when you initialize G randomly,

48
00:03:15,855 --> 00:03:18,535
you're initial randomly generated image is

49
00:03:18,535 --> 00:03:24,110
just this white noise image with each pixel value chosen at random.

50
00:03:24,110 --> 00:03:25,905
As you run gradient descent,

51
00:03:25,905 --> 00:03:32,325
you minimize the cost function J of G slowly through the pixel value so then you get

52
00:03:32,325 --> 00:03:36,030
slowly an image that looks more and more like

53
00:03:36,030 --> 00:03:40,755
your content image rendered in the style of your style image.

54
00:03:40,755 --> 00:03:44,690
In this video, you saw the overall outline of

55
00:03:44,690 --> 00:03:47,530
the Neural Style Transfer Algorithm where you define

56
00:03:47,530 --> 00:03:51,420
a cost function for the generated image G and minimize it.

57
00:03:51,420 --> 00:03:53,420
Next, we need to see how to define

58
00:03:53,420 --> 00:03:57,210
the content cost function as well as the style cost function.

59
00:03:57,210 --> 00:03:59,530
Let's take a look at that starting in the next video.