1
00:00:02,720 --> 00:00:07,730
Hi. I hope you love paying into

2
00:00:07,730 --> 00:00:14,550
the details of TensorFlow and the experience of MNIST dataset but,

3
00:00:14,550 --> 00:00:17,100
of course, as you might have noticed that it can be tedious.

4
00:00:17,100 --> 00:00:18,920
It requires some duplicate work,

5
00:00:18,920 --> 00:00:21,510
duplicate lines of code.

6
00:00:21,510 --> 00:00:29,540
Of course, there are frameworks to allow you to make your life easier,

7
00:00:29,540 --> 00:00:39,100
to implement the common neural networks with much less effort.

8
00:00:39,100 --> 00:00:41,735
Now, we'll be using one of them called Keras.

9
00:00:41,735 --> 00:00:49,040
The choice here is a matter of taste and the particular problem's in front of you.

10
00:00:49,040 --> 00:00:54,960
We just picked one. Now, let us begin.

11
00:00:54,960 --> 00:00:59,385
We will be using

12
00:00:59,385 --> 00:01:04,340
the MNIST datasets or the latest set of handwritten digits as a whole of it,

13
00:01:04,340 --> 00:01:10,430
and the loader, and the import Keras.

14
00:01:10,430 --> 00:01:17,370
Also, what we do here is we map class.

15
00:01:17,370 --> 00:01:26,040
We use one-hot-encodings in order to get zeros and bonds from the classes labels,

16
00:01:26,040 --> 00:01:30,600
Matplotlib, and here's a five.

17
00:01:30,600 --> 00:01:39,420
Now, how do you think Keras will be using TensorFlows?

18
00:01:39,420 --> 00:01:50,315
This is the one we import as well and now we'll be using a simple multilayer perceptron.

19
00:01:50,315 --> 00:01:55,560
Here, we create a container,

20
00:01:55,560 --> 00:01:57,400
which will restore our layers.

21
00:01:57,400 --> 00:01:59,090
We define input layer.

22
00:01:59,090 --> 00:02:02,220
We know which will accept images,

23
00:02:02,220 --> 00:02:05,535
28 by 28 pixels.

24
00:02:05,535 --> 00:02:11,280
Then, we'll flatten them because we will

25
00:02:11,280 --> 00:02:14,650
transform a two-dimensional matrix to one dimensional.

26
00:02:14,650 --> 00:02:17,545
Then, we will add the two dense layers.

27
00:02:17,545 --> 00:02:21,130
If you remember the beginning of this week,

28
00:02:21,130 --> 00:02:25,905
the dense layer is just a linear model.

29
00:02:25,905 --> 00:02:34,060
Then, we added an output layer and so we'll have a neuron for each class,

30
00:02:34,060 --> 00:02:39,560
and we'll apply the softmax function in order to transform outputs into probability.

31
00:02:39,560 --> 00:02:45,615
Then, the last touch is compiling the model.

32
00:02:45,615 --> 00:02:50,140
We add an optimization algorithm

33
00:02:50,140 --> 00:02:59,030
then we define those functions.

34
00:02:59,030 --> 00:03:06,170
So categorical_crossentropy is just the same crossentropy you're used to,

35
00:03:06,170 --> 00:03:17,570
but applied for one-hot-encoded vectors,

36
00:03:18,580 --> 00:03:23,545
and we define the metric accuracy.

37
00:03:23,545 --> 00:03:25,650
Good.

38
00:03:25,650 --> 00:03:28,165
Now, I have a question for you.

39
00:03:28,165 --> 00:03:34,360
How many parameters will a session network have?

40
00:03:36,410 --> 00:03:47,920
Let's answer it. Keras has nice summary facilities so here is our network.

41
00:03:47,920 --> 00:03:57,870
We begin with input and we enter into flatten.

42
00:03:57,870 --> 00:04:00,495
We go through two linear layers,

43
00:04:00,495 --> 00:04:01,900
and in the end,

44
00:04:01,900 --> 00:04:06,940
we added here the softmax.

45
00:04:08,390 --> 00:04:13,605
The basic interface is very simple.

46
00:04:13,605 --> 00:04:21,210
This is from scikit-learn or we joined just five passes,

47
00:04:21,210 --> 00:04:30,800
which should be rather fast even on GPU, and, of course,

48
00:04:30,800 --> 00:04:37,830
the interface for probability prediction is indeed very simple to predict the causes,

49
00:04:37,830 --> 00:04:41,150
probabilities for first elements.

50
00:04:41,150 --> 00:04:46,670
Models can be saved and loaded, model.save.

51
00:04:46,670 --> 00:04:50,460
Now, we can compute test accuracy.

52
00:04:50,460 --> 00:04:56,320
That's not very good.

53
00:04:56,320 --> 00:05:01,400
This is what we get by ending that model.

54
00:05:01,400 --> 00:05:06,450
What do you think is the problem?

55
00:05:06,620 --> 00:05:12,220
Well, of course, the problem is that we were stuck

56
00:05:12,220 --> 00:05:17,685
to two linear layers together and as you know already,

57
00:05:17,685 --> 00:05:19,870
two linear layers together are,

58
00:05:19,870 --> 00:05:25,040
by no means, a good learning models.

59
00:05:25,040 --> 00:05:29,020
So if we change activations from linear to say, relu,

60
00:05:29,020 --> 00:05:37,220
it should obtain a much better result.

61
00:05:39,520 --> 00:05:46,230
Almost none.

62
00:05:47,420 --> 00:05:54,010
What's the prob?

63
00:05:54,010 --> 00:05:57,575
A sudden jump in quality.

64
00:05:57,575 --> 00:06:05,150
Good. Now, one of your assignments will be to tune this network,

65
00:06:05,150 --> 00:06:10,240
to improve the quality so I invite you to add layers,

66
00:06:10,240 --> 00:06:15,535
and to play with activations.

67
00:06:15,535 --> 00:06:20,070
Before we get to actual hacking,

68
00:06:20,070 --> 00:06:24,450
there is one more thing.

69
00:06:24,450 --> 00:06:26,040
Keras is integrated to

70
00:06:26,040 --> 00:06:31,870
TensorBoard so it's sort of fun so there was the reason for choosing Keras,

71
00:06:31,870 --> 00:06:34,710
but the integration is, of course, very easy.

72
00:06:34,710 --> 00:06:44,325
You just add an option to fit function.

73
00:06:44,325 --> 00:07:00,785
If we run the training,

74
00:07:00,785 --> 00:07:03,955
we should see the TensorBoard,

75
00:07:03,955 --> 00:07:08,060
we should see the line graph.

76
00:07:08,060 --> 00:07:10,130
You can see them in terms of accuracy,

77
00:07:10,130 --> 00:07:12,370
in terms of train loss,

78
00:07:12,370 --> 00:07:18,755
and train accuracy, and the chance of validation accuracy, and validation loss.

79
00:07:18,755 --> 00:07:25,665
If you want to study Keras in more details,

80
00:07:25,665 --> 00:07:29,240
graph visualization, we'll help you.

81
00:07:29,240 --> 00:07:32,975
We'll create details here.

82
00:07:32,975 --> 00:07:38,285
As you can see, it's a bit nonhuman friendly.

83
00:07:38,285 --> 00:07:42,260
Going back and to summarize,

84
00:07:42,260 --> 00:07:45,960
Keras is a high-level framework,

85
00:07:45,960 --> 00:07:49,140
which makes a construction of neural networks easy.

86
00:07:49,140 --> 00:07:54,230
If you see here, we didn't do

87
00:07:54,230 --> 00:07:58,310
almost any unnecessary operations so each line of

88
00:07:58,310 --> 00:08:02,170
code adds something substantial to the model.

89
00:08:02,170 --> 00:08:08,635
Then, of course, as you'll be learning more about Deep Learning,

90
00:08:08,635 --> 00:08:13,275
you're also be learning more about how to run it at Keras.

91
00:08:13,275 --> 00:08:16,170
Now, for your assignment.

92
00:08:16,170 --> 00:08:20,350
Your assignment will be to improve the data quality.

93
00:08:20,350 --> 00:08:25,750
The suggestions here are very obvious.

94
00:08:25,750 --> 00:08:28,105
There are several ways.

95
00:08:28,105 --> 00:08:32,855
First is to add more layers, add more parameters.

96
00:08:32,855 --> 00:08:36,120
Of course, it will increase the computational cost,

97
00:08:36,120 --> 00:08:38,625
it creates the reason for overfitting,

98
00:08:38,625 --> 00:08:43,460
but otherwise, it's a classical way of improving that neural network performance.

99
00:08:43,460 --> 00:08:46,040
Another principle that you should also consider

100
00:08:46,040 --> 00:08:51,200
is not running the whole thing every time.

101
00:08:51,200 --> 00:08:54,575
When you see that the quality improvement stopped,

102
00:08:54,575 --> 00:08:57,195
then they should probably stop training.

103
00:08:57,195 --> 00:09:00,530
You should also experiment with

104
00:09:00,530 --> 00:09:04,890
different nonlinearities and probably different optimization algorithms.

105
00:09:04,890 --> 00:09:08,265
Some of them converge much faster than the others.

106
00:09:08,265 --> 00:09:13,235
Then, you could probably add regularization to your function.

107
00:09:13,235 --> 00:09:16,150
And, of course, Keras provides such functionality.

108
00:09:16,150 --> 00:09:23,135
The last thing probably not very relevant towards the numbers,

109
00:09:23,135 --> 00:09:28,955
that you can always get more data for free by

110
00:09:28,955 --> 00:09:36,810
using Keras tools that will zoom-rotate-slice,

111
00:09:36,810 --> 00:09:43,330
but keep in mind that these informations probably should make sense.

112
00:09:43,330 --> 00:09:51,345
This is all for this week's video materials.

113
00:09:51,345 --> 00:09:55,650
I hope you'll find your assignments and exercise enjoyable.

114
00:09:55,650 --> 00:10:06,380
Thank you.