1 00:00:02,720 --> 00:00:07,730 Hi. I hope you love paying into 2 00:00:07,730 --> 00:00:14,550 the details of TensorFlow and the experience of MNIST dataset but, 3 00:00:14,550 --> 00:00:17,100 of course, as you might have noticed that it can be tedious. 4 00:00:17,100 --> 00:00:18,920 It requires some duplicate work, 5 00:00:18,920 --> 00:00:21,510 duplicate lines of code. 6 00:00:21,510 --> 00:00:29,540 Of course, there are frameworks to allow you to make your life easier, 7 00:00:29,540 --> 00:00:39,100 to implement the common neural networks with much less effort. 8 00:00:39,100 --> 00:00:41,735 Now, we'll be using one of them called Keras. 9 00:00:41,735 --> 00:00:49,040 The choice here is a matter of taste and the particular problem's in front of you. 10 00:00:49,040 --> 00:00:54,960 We just picked one. Now, let us begin. 11 00:00:54,960 --> 00:00:59,385 We will be using 12 00:00:59,385 --> 00:01:04,340 the MNIST datasets or the latest set of handwritten digits as a whole of it, 13 00:01:04,340 --> 00:01:10,430 and the loader, and the import Keras. 14 00:01:10,430 --> 00:01:17,370 Also, what we do here is we map class. 15 00:01:17,370 --> 00:01:26,040 We use one-hot-encodings in order to get zeros and bonds from the classes labels, 16 00:01:26,040 --> 00:01:30,600 Matplotlib, and here's a five. 17 00:01:30,600 --> 00:01:39,420 Now, how do you think Keras will be using TensorFlows? 18 00:01:39,420 --> 00:01:50,315 This is the one we import as well and now we'll be using a simple multilayer perceptron. 19 00:01:50,315 --> 00:01:55,560 Here, we create a container, 20 00:01:55,560 --> 00:01:57,400 which will restore our layers. 21 00:01:57,400 --> 00:01:59,090 We define input layer. 22 00:01:59,090 --> 00:02:02,220 We know which will accept images, 23 00:02:02,220 --> 00:02:05,535 28 by 28 pixels. 24 00:02:05,535 --> 00:02:11,280 Then, we'll flatten them because we will 25 00:02:11,280 --> 00:02:14,650 transform a two-dimensional matrix to one dimensional. 26 00:02:14,650 --> 00:02:17,545 Then, we will add the two dense layers. 27 00:02:17,545 --> 00:02:21,130 If you remember the beginning of this week, 28 00:02:21,130 --> 00:02:25,905 the dense layer is just a linear model. 29 00:02:25,905 --> 00:02:34,060 Then, we added an output layer and so we'll have a neuron for each class, 30 00:02:34,060 --> 00:02:39,560 and we'll apply the softmax function in order to transform outputs into probability. 31 00:02:39,560 --> 00:02:45,615 Then, the last touch is compiling the model. 32 00:02:45,615 --> 00:02:50,140 We add an optimization algorithm 33 00:02:50,140 --> 00:02:59,030 then we define those functions. 34 00:02:59,030 --> 00:03:06,170 So categorical_crossentropy is just the same crossentropy you're used to, 35 00:03:06,170 --> 00:03:17,570 but applied for one-hot-encoded vectors, 36 00:03:18,580 --> 00:03:23,545 and we define the metric accuracy. 37 00:03:23,545 --> 00:03:25,650 Good. 38 00:03:25,650 --> 00:03:28,165 Now, I have a question for you. 39 00:03:28,165 --> 00:03:34,360 How many parameters will a session network have? 40 00:03:36,410 --> 00:03:47,920 Let's answer it. Keras has nice summary facilities so here is our network. 41 00:03:47,920 --> 00:03:57,870 We begin with input and we enter into flatten. 42 00:03:57,870 --> 00:04:00,495 We go through two linear layers, 43 00:04:00,495 --> 00:04:01,900 and in the end, 44 00:04:01,900 --> 00:04:06,940 we added here the softmax. 45 00:04:08,390 --> 00:04:13,605 The basic interface is very simple. 46 00:04:13,605 --> 00:04:21,210 This is from scikit-learn or we joined just five passes, 47 00:04:21,210 --> 00:04:30,800 which should be rather fast even on GPU, and, of course, 48 00:04:30,800 --> 00:04:37,830 the interface for probability prediction is indeed very simple to predict the causes, 49 00:04:37,830 --> 00:04:41,150 probabilities for first elements. 50 00:04:41,150 --> 00:04:46,670 Models can be saved and loaded, model.save. 51 00:04:46,670 --> 00:04:50,460 Now, we can compute test accuracy. 52 00:04:50,460 --> 00:04:56,320 That's not very good. 53 00:04:56,320 --> 00:05:01,400 This is what we get by ending that model. 54 00:05:01,400 --> 00:05:06,450 What do you think is the problem? 55 00:05:06,620 --> 00:05:12,220 Well, of course, the problem is that we were stuck 56 00:05:12,220 --> 00:05:17,685 to two linear layers together and as you know already, 57 00:05:17,685 --> 00:05:19,870 two linear layers together are, 58 00:05:19,870 --> 00:05:25,040 by no means, a good learning models. 59 00:05:25,040 --> 00:05:29,020 So if we change activations from linear to say, relu, 60 00:05:29,020 --> 00:05:37,220 it should obtain a much better result. 61 00:05:39,520 --> 00:05:46,230 Almost none. 62 00:05:47,420 --> 00:05:54,010 What's the prob? 63 00:05:54,010 --> 00:05:57,575 A sudden jump in quality. 64 00:05:57,575 --> 00:06:05,150 Good. Now, one of your assignments will be to tune this network, 65 00:06:05,150 --> 00:06:10,240 to improve the quality so I invite you to add layers, 66 00:06:10,240 --> 00:06:15,535 and to play with activations. 67 00:06:15,535 --> 00:06:20,070 Before we get to actual hacking, 68 00:06:20,070 --> 00:06:24,450 there is one more thing. 69 00:06:24,450 --> 00:06:26,040 Keras is integrated to 70 00:06:26,040 --> 00:06:31,870 TensorBoard so it's sort of fun so there was the reason for choosing Keras, 71 00:06:31,870 --> 00:06:34,710 but the integration is, of course, very easy. 72 00:06:34,710 --> 00:06:44,325 You just add an option to fit function. 73 00:06:44,325 --> 00:07:00,785 If we run the training, 74 00:07:00,785 --> 00:07:03,955 we should see the TensorBoard, 75 00:07:03,955 --> 00:07:08,060 we should see the line graph. 76 00:07:08,060 --> 00:07:10,130 You can see them in terms of accuracy, 77 00:07:10,130 --> 00:07:12,370 in terms of train loss, 78 00:07:12,370 --> 00:07:18,755 and train accuracy, and the chance of validation accuracy, and validation loss. 79 00:07:18,755 --> 00:07:25,665 If you want to study Keras in more details, 80 00:07:25,665 --> 00:07:29,240 graph visualization, we'll help you. 81 00:07:29,240 --> 00:07:32,975 We'll create details here. 82 00:07:32,975 --> 00:07:38,285 As you can see, it's a bit nonhuman friendly. 83 00:07:38,285 --> 00:07:42,260 Going back and to summarize, 84 00:07:42,260 --> 00:07:45,960 Keras is a high-level framework, 85 00:07:45,960 --> 00:07:49,140 which makes a construction of neural networks easy. 86 00:07:49,140 --> 00:07:54,230 If you see here, we didn't do 87 00:07:54,230 --> 00:07:58,310 almost any unnecessary operations so each line of 88 00:07:58,310 --> 00:08:02,170 code adds something substantial to the model. 89 00:08:02,170 --> 00:08:08,635 Then, of course, as you'll be learning more about Deep Learning, 90 00:08:08,635 --> 00:08:13,275 you're also be learning more about how to run it at Keras. 91 00:08:13,275 --> 00:08:16,170 Now, for your assignment. 92 00:08:16,170 --> 00:08:20,350 Your assignment will be to improve the data quality. 93 00:08:20,350 --> 00:08:25,750 The suggestions here are very obvious. 94 00:08:25,750 --> 00:08:28,105 There are several ways. 95 00:08:28,105 --> 00:08:32,855 First is to add more layers, add more parameters. 96 00:08:32,855 --> 00:08:36,120 Of course, it will increase the computational cost, 97 00:08:36,120 --> 00:08:38,625 it creates the reason for overfitting, 98 00:08:38,625 --> 00:08:43,460 but otherwise, it's a classical way of improving that neural network performance. 99 00:08:43,460 --> 00:08:46,040 Another principle that you should also consider 100 00:08:46,040 --> 00:08:51,200 is not running the whole thing every time. 101 00:08:51,200 --> 00:08:54,575 When you see that the quality improvement stopped, 102 00:08:54,575 --> 00:08:57,195 then they should probably stop training. 103 00:08:57,195 --> 00:09:00,530 You should also experiment with 104 00:09:00,530 --> 00:09:04,890 different nonlinearities and probably different optimization algorithms. 105 00:09:04,890 --> 00:09:08,265 Some of them converge much faster than the others. 106 00:09:08,265 --> 00:09:13,235 Then, you could probably add regularization to your function. 107 00:09:13,235 --> 00:09:16,150 And, of course, Keras provides such functionality. 108 00:09:16,150 --> 00:09:23,135 The last thing probably not very relevant towards the numbers, 109 00:09:23,135 --> 00:09:28,955 that you can always get more data for free by 110 00:09:28,955 --> 00:09:36,810 using Keras tools that will zoom-rotate-slice, 111 00:09:36,810 --> 00:09:43,330 but keep in mind that these informations probably should make sense. 112 00:09:43,330 --> 00:09:51,345 This is all for this week's video materials. 113 00:09:51,345 --> 00:09:55,650 I hope you'll find your assignments and exercise enjoyable. 114 00:09:55,650 --> 00:10:06,380 Thank you.