1 00:00:00,000 --> 00:00:02,400 To build a Neural Style Transfer system, 2 00:00:02,400 --> 00:00:05,730 let's define a cost function for the generated image. 3 00:00:05,730 --> 00:00:09,953 What you see later is that by minimizing this cost function, 4 00:00:09,953 --> 00:00:12,270 you can generate the image that you want. 5 00:00:12,270 --> 00:00:15,231 Remember what the problem formulation is. 6 00:00:15,231 --> 00:00:17,400 You're given a content image C, 7 00:00:17,400 --> 00:00:21,510 given a style image S and you goal is to generate a new image 8 00:00:21,510 --> 00:00:26,180 G. In order to implement neural style transfer, 9 00:00:26,180 --> 00:00:34,080 what you're going to do is define a cost function J of G that measures how good is 10 00:00:34,080 --> 00:00:37,920 a particular generated image and we'll use gradient to 11 00:00:37,920 --> 00:00:42,425 descent to minimize J of G in order to generate this image. 12 00:00:42,425 --> 00:00:44,490 How good is a particular image? 13 00:00:44,490 --> 00:00:48,460 Well, we're going to define two parts to this cost function. 14 00:00:48,460 --> 00:00:52,190 The first part is called the content cost. 15 00:00:52,190 --> 00:00:56,480 This is a function of the content image and of the generated image and 16 00:00:56,480 --> 00:01:00,960 what it does is it measures how similar is the contents of the generated image 17 00:01:00,960 --> 00:01:05,495 to the content of the content image C. And then going to 18 00:01:05,495 --> 00:01:10,345 add that to a style cost function which is now a function of 19 00:01:10,345 --> 00:01:14,720 S,G and what this does is it measures how similar is 20 00:01:14,720 --> 00:01:20,547 the style of the image G to the style of the image S. Finally, 21 00:01:20,547 --> 00:01:24,180 we'll weight these with two hyper parameters alpha and beta to 22 00:01:24,180 --> 00:01:29,610 specify the relative weighting between the content costs and the style cost. 23 00:01:29,610 --> 00:01:33,405 It seems redundant to use two different hyper parameters to specify 24 00:01:33,405 --> 00:01:44,370 the relative cost of the weighting. 25 00:01:44,370 --> 00:01:47,070 One hyper parameter seems like it would be enough but 26 00:01:47,070 --> 00:01:50,230 the original authors of the Neural Style Transfer Algorithm, 27 00:01:50,230 --> 00:01:52,410 use two different hyper parameters. 28 00:01:52,410 --> 00:01:55,278 I'm just going to follow their convention here. 29 00:01:55,278 --> 00:01:57,925 The Neural Style Transfer Algorithm I'm 30 00:01:57,925 --> 00:02:00,810 going to present in the next few videos is due to Leon Gatys, 31 00:02:00,810 --> 00:02:04,200 Alexander Ecker and Matthias. 32 00:02:04,200 --> 00:02:09,630 Their papers is not too hard to read so after watching these few videos if you wish, 33 00:02:09,630 --> 00:02:14,550 I certainly encourage you to take a look at their paper as well if you want. 34 00:02:14,550 --> 00:02:17,910 The way the algorithm would run is as follows, 35 00:02:17,910 --> 00:02:21,300 having to find the cost function J of G in 36 00:02:21,300 --> 00:02:25,030 order to actually generate a new image what you do is the following. 37 00:02:25,030 --> 00:02:29,035 You would initialize the generated image 38 00:02:29,035 --> 00:02:30,720 G randomly so it might be 100 by 100 by 3 or 500 by 500 by 39 00:02:30,720 --> 00:02:37,200 3 or whatever dimension you want it to be. 40 00:02:37,200 --> 00:02:41,885 Then we'll define the cost function J of G on the previous slide. 41 00:02:41,885 --> 00:02:47,805 What you can do is use gradient descent to minimize this so you can update G as 42 00:02:47,805 --> 00:02:54,300 G minus the derivative respect to the cost function of J of G. In this process, 43 00:02:54,300 --> 00:02:58,770 you're actually updating the pixel values of this image G which is 44 00:02:58,770 --> 00:03:04,175 a 100 by 100 by 3 maybe rgb channel image. 45 00:03:04,175 --> 00:03:10,215 Here's an example, let's say you start with this content image and this style image. 46 00:03:10,215 --> 00:03:13,365 This is a another probably Picasso image. 47 00:03:13,365 --> 00:03:15,855 Then when you initialize G randomly, 48 00:03:15,855 --> 00:03:18,535 you're initial randomly generated image is 49 00:03:18,535 --> 00:03:24,110 just this white noise image with each pixel value chosen at random. 50 00:03:24,110 --> 00:03:25,905 As you run gradient descent, 51 00:03:25,905 --> 00:03:32,325 you minimize the cost function J of G slowly through the pixel value so then you get 52 00:03:32,325 --> 00:03:36,030 slowly an image that looks more and more like 53 00:03:36,030 --> 00:03:40,755 your content image rendered in the style of your style image. 54 00:03:40,755 --> 00:03:44,690 In this video, you saw the overall outline of 55 00:03:44,690 --> 00:03:47,530 the Neural Style Transfer Algorithm where you define 56 00:03:47,530 --> 00:03:51,420 a cost function for the generated image G and minimize it. 57 00:03:51,420 --> 00:03:53,420 Next, we need to see how to define 58 00:03:53,420 --> 00:03:57,210 the content cost function as well as the style cost function. 59 00:03:57,210 --> 00:03:59,530 Let's take a look at that starting in the next video.