1
00:00:00,000 --> 00:00:04,284
[MUSIC]

2
00:00:04,284 --> 00:00:09,149
Hi, in this lecture, we will study
hyperparameter optimization process and

3
00:00:09,149 --> 00:00:13,530
talk about hyperparameters in
specific libraries and models.

4
00:00:13,530 --> 00:00:17,430
We will first discuss
hyperparameter tuning in general.

5
00:00:17,430 --> 00:00:22,220
General pipeline, ways to tuning
hyperparameters, and what it actually

6
00:00:22,220 --> 00:00:26,800
means to understand how a particular
hyperparameter influences the model.

7
00:00:28,790 --> 00:00:31,100
It is actually what we will
discuss in this video, and

8
00:00:31,100 --> 00:00:33,900
then we will talk about libraries and
frameworks, and

9
00:00:33,900 --> 00:00:37,440
see how to tune hyperparameters
of several types of models.

10
00:00:39,090 --> 00:00:41,651
Namely, we will first
study tree-based models,

11
00:00:41,651 --> 00:00:45,380
gradient boosting decision trees and
RandomForest.

12
00:00:45,380 --> 00:00:49,060
Then I'll review important
hyperparameters in neural nets.

13
00:00:49,060 --> 00:00:53,700
And finally, we will talk about
linear models, where to find them and

14
00:00:53,700 --> 00:00:54,450
how to tune them.

15
00:00:55,750 --> 00:00:58,580
Another class of interesting
models is factorization machines.

16
00:00:59,600 --> 00:01:02,190
We will not discuss factorization
machines in this lecture,

17
00:01:02,190 --> 00:01:04,980
but I suggest you to read
about them on the internet.

18
00:01:06,630 --> 00:01:11,780
So, let's start with a general
discussion of a model tuning process.

19
00:01:12,800 --> 00:01:16,859
What are the most important things to
understand when tuning hyperparameters?

20
00:01:18,180 --> 00:01:22,640
First, there are tons of potential
parameters to tune in every model.

21
00:01:22,640 --> 00:01:26,380
And so we need to realize which
parameters are affect the model most.

22
00:01:27,590 --> 00:01:30,470
Of course,
all the parameters are reliable,

23
00:01:30,470 --> 00:01:35,050
but we kind of need to select
the most important ones.

24
00:01:35,050 --> 00:01:38,980
Anyway we never have time to tune
all the params, that's right.

25
00:01:38,980 --> 00:01:42,665
So we need to come up with a nice
subset of parameters to tune.

26
00:01:42,665 --> 00:01:44,805
Suppose we're new to xgboost and

27
00:01:44,805 --> 00:01:49,088
we're trying to find out what
parameters will better to tune, and

28
00:01:49,088 --> 00:01:53,920
say we don't even understand how
gradient boosting decision tree works.

29
00:01:55,080 --> 00:01:59,906
We always can search what parameters
people usually set when using xgboost.

30
00:01:59,906 --> 00:02:01,086
It's quite easy to look up, right?

31
00:02:01,086 --> 00:02:05,290
For example, at GitHub or Kaggle Kernels.

32
00:02:06,900 --> 00:02:10,750
Finally, the documentation sometimes
explicitly states which parameter

33
00:02:10,750 --> 00:02:11,670
to tune first.

34
00:02:13,010 --> 00:02:16,760
From the selected set of parameters
we should then understand

35
00:02:16,760 --> 00:02:19,490
what would happen if we
change one of the parameters?

36
00:02:20,520 --> 00:02:25,840
How the training process and the training
invalidation course will change if we,

37
00:02:25,840 --> 00:02:28,390
for example,
increased a certain parameter?

38
00:02:30,090 --> 00:02:35,030
And finally, actually tune
the selected parameters, right?

39
00:02:35,030 --> 00:02:38,070
Most people do it manually.

40
00:02:38,070 --> 00:02:41,700
Just run, examine the logs,
change parameters,

41
00:02:41,700 --> 00:02:45,230
run again and
iterate till good parameters found.

42
00:02:45,230 --> 00:02:50,063
It is also possible to use
hyperparameter optimization

43
00:02:50,063 --> 00:02:53,900
tools like hyperopt, but it's usually
faster to do it manually to be true.

44
00:02:55,090 --> 00:02:59,550
So later in this video, actually discuss
the most important parameters for

45
00:02:59,550 --> 00:03:05,400
some models along with some intuition how
to tune those parameters of those models.

46
00:03:06,960 --> 00:03:10,870
But before we start, I actually want
to give you a list of libraries

47
00:03:10,870 --> 00:03:14,970
that you can use for
automatic hyperparameter tuning.

48
00:03:16,170 --> 00:03:20,580
There are lots of them actually, and
I didn't try everything from this list

49
00:03:20,580 --> 00:03:25,420
myself, but from what I actually tried,
I did not notice much

50
00:03:25,420 --> 00:03:29,510
difference in optimization speed on
real tasks between the libraries.

51
00:03:29,510 --> 00:03:33,780
But if you have time,
you can try every library and compare.

52
00:03:35,250 --> 00:03:39,928
From a user side these
libraries are very easy to use.

53
00:03:39,928 --> 00:03:41,501
We need first to define the function
that will run our module,

54
00:03:41,501 --> 00:03:42,264
in this case, it is XGBoost.

55
00:03:42,264 --> 00:03:48,436
That will run our module with
the given set of parameters and

56
00:03:48,436 --> 00:03:52,520
return a resulting validation score.

57
00:03:54,090 --> 00:03:57,530
And second,
we need to specify a source space.

58
00:03:57,530 --> 00:04:03,420
The range for the hyperparameters where
we want to look for the solution.

59
00:04:03,420 --> 00:04:07,624
For example, here we see that a parameter,
it is fix 0.1.

60
00:04:07,624 --> 00:04:13,744
And we think that optimal max depth
is somewhere between 10 and 30.

61
00:04:15,690 --> 00:04:18,730
And actually that is it,
we are ready to run hyperopt.

62
00:04:18,730 --> 00:04:24,410
It can take much time, so
the best strategy is to run it overnight.

63
00:04:26,070 --> 00:04:31,260
And also please note that everything
we need to know about hyperparameter's,

64
00:04:31,260 --> 00:04:35,220
in this case,
is an adequate range for the search.

65
00:04:35,220 --> 00:04:40,560
That's pretty convenient,
if you don't know the new model and

66
00:04:40,560 --> 00:04:43,350
you just try to run.

67
00:04:43,350 --> 00:04:45,800
But still,
most people tuned the models manually.

68
00:04:47,850 --> 00:04:52,020
So, what exactly does it
mean to understand how

69
00:04:52,020 --> 00:04:54,350
parameter influences the model?

70
00:04:55,590 --> 00:04:56,990
Broadly speaking,

71
00:04:56,990 --> 00:05:02,570
different values of parameters result
in three different fitting behavior.

72
00:05:02,570 --> 00:05:05,180
First, a model can underfit.

73
00:05:06,270 --> 00:05:10,980
That is, it is so constrained that
it cannot even learn the train set.

74
00:05:12,400 --> 00:05:16,440
Another possibility is that
the model is so powerful that

75
00:05:16,440 --> 00:05:20,519
it just overfits to the train set and
is not able to generalize it all.

76
00:05:21,630 --> 00:05:22,570
And finally,

77
00:05:22,570 --> 00:05:26,810
the third behavior is something
that we are actually looking for.

78
00:05:26,810 --> 00:05:29,760
It's somewhere between underfitting and
overfitting.

79
00:05:31,546 --> 00:05:37,280
So basically, what we should examine
while turning parameters is that

80
00:05:37,280 --> 00:05:43,140
we should try to understand if the model
is currently underfitting or overfitting.

81
00:05:43,140 --> 00:05:46,320
And then, we should somehow
adjust the parameters to get

82
00:05:46,320 --> 00:05:48,030
closer to desired behavior.

83
00:05:49,580 --> 00:05:53,880
We need to kind of split all the
parameters that we would like to tune into

84
00:05:53,880 --> 00:05:55,700
two groups.

85
00:05:55,700 --> 00:05:59,910
In the first group, we'll have
the parameters that constrain the model.

86
00:05:59,910 --> 00:06:03,970
So if we increase
the parameter from that group,

87
00:06:03,970 --> 00:06:09,440
the model would change its behaviour
from overfitting to underfitting.

88
00:06:09,440 --> 00:06:13,909
The larger the value of the parameter,
the heavier the constraint.

89
00:06:13,909 --> 00:06:18,981
In the following videos, we'll color such
parameters in red, and the parameters

90
00:06:18,981 --> 00:06:23,407
in the second group are doing an opposite
thing to our training process.

91
00:06:23,407 --> 00:06:27,660
The higher the value,
more powerful the main module.

92
00:06:28,760 --> 00:06:31,090
And so by increasing such parameters,

93
00:06:31,090 --> 00:06:35,310
we can change fitting behavior
from underfitting to overfitting.

94
00:06:36,320 --> 00:06:38,810
We will use green color for
such parameters.

95
00:06:40,570 --> 00:06:43,600
So, in this video we'll be discussing some

96
00:06:43,600 --> 00:06:47,150
general aspects of
hyperparameter organization.

97
00:06:47,150 --> 00:06:51,600
Most importantly,
we've defined the color coding.

98
00:06:51,600 --> 00:06:53,940
If you did not understand
what color stands for

99
00:06:53,940 --> 00:06:57,710
what, please watch a part of
the video about it again.

100
00:06:58,750 --> 00:07:01,860
We'll use this color coding
throughout the following videos.

101
00:07:01,860 --> 00:07:11,860
[MUSIC]