1
00:00:00,133 --> 00:00:02,423
In the last video, we talked

2
00:00:02,423 --> 00:00:06,653
about the hypothesis representation for logistic progression.

3
00:00:06,700 --> 00:00:07,963
What I'd like to do now is

4
00:00:07,963 --> 00:00:09,389
tell you about something called the

5
00:00:09,389 --> 00:00:11,370
decision boundary, and this

6
00:00:11,380 --> 00:00:12,894
will give us a better sense

7
00:00:12,894 --> 00:00:15,017
of what the logistic regression

8
00:00:15,030 --> 00:00:17,870
hypothesis function is computing.

9
00:00:17,870 --> 00:00:20,080
To recap, this is

10
00:00:20,080 --> 00:00:21,264
what we wrote out last time,

11
00:00:21,280 --> 00:00:22,663
where we said that the

12
00:00:22,663 --> 00:00:24,916
hypothesis is represented as

13
00:00:24,930 --> 00:00:26,119
H of X equals G of

14
00:00:26,119 --> 00:00:28,363
theta transpose X, where G

15
00:00:28,363 --> 00:00:29,871
is this function called the

16
00:00:29,871 --> 00:00:32,729
sigmoid function, which looks like this.

17
00:00:32,750 --> 00:00:35,131
So, it slowly increases from zero

18
00:00:35,131 --> 00:00:38,996
to one, asymptoting at one.

19
00:00:38,996 --> 00:00:40,391
What I want to do now is

20
00:00:40,391 --> 00:00:42,452
try to understand better when

21
00:00:42,452 --> 00:00:44,054
this hypothesis will make

22
00:00:44,070 --> 00:00:45,327
predictions that Y is

23
00:00:45,327 --> 00:00:47,049
equal to one versus when it

24
00:00:47,049 --> 00:00:48,361
might make predictions that Y

25
00:00:48,361 --> 00:00:50,602
is equal to zero and understand

26
00:00:50,630 --> 00:00:52,351
better what the hypothesis function

27
00:00:52,351 --> 00:00:56,622
looks like, particularly when we have more than one feature.

28
00:00:56,640 --> 00:00:59,064
Concretely, this hypothesis is

29
00:00:59,064 --> 00:01:00,827
out putting estimates of the

30
00:01:00,827 --> 00:01:02,057
probability that Y is

31
00:01:02,060 --> 00:01:05,493
equal to one given X is prime.

32
00:01:05,530 --> 00:01:06,807
So if we wanted to

33
00:01:06,807 --> 00:01:08,181
predict is Y equal to

34
00:01:08,181 --> 00:01:09,478
one or is Y equal

35
00:01:09,478 --> 00:01:12,217
to zero here's something we might do.

36
00:01:12,240 --> 00:01:14,737
When ever the hypothesis its that

37
00:01:14,737 --> 00:01:16,412
the problem with y being one

38
00:01:16,412 --> 00:01:17,570
is greater than or equal

39
00:01:17,570 --> 00:01:19,340
to 0.5 so this means

40
00:01:19,350 --> 00:01:21,068
that it is more likely to

41
00:01:21,068 --> 00:01:22,295
be y equals one than y

42
00:01:22,295 --> 00:01:26,509
equals zero then let's predict Y equals one.

43
00:01:26,509 --> 00:01:27,942
And otherwise, if the probability

44
00:01:27,960 --> 00:01:30,168
of, the estimated probability of

45
00:01:30,180 --> 00:01:31,898
Y being one is less

46
00:01:31,898 --> 00:01:35,025
than 0.5, then let's predict Y equals zero.

47
00:01:35,025 --> 00:01:36,277
And I chose a greater

48
00:01:36,277 --> 00:01:39,666
than or equal to here and less than here.

49
00:01:39,670 --> 00:01:41,010
If H of X is equal

50
00:01:41,010 --> 00:01:43,063
to 0.5 exactly, then

51
00:01:43,063 --> 00:01:44,670
we could predict positive or

52
00:01:44,670 --> 00:01:45,820
negative vector but a put a

53
00:01:45,820 --> 00:01:47,464
great deal on to here

54
00:01:47,464 --> 00:01:49,220
so we default maybe to predicting

55
00:01:49,220 --> 00:01:51,459
a positive if your

56
00:01:51,459 --> 00:01:52,883
vector is 0.5 but that's

57
00:01:52,883 --> 00:01:56,675
a detail that really doesn't matter that much.

58
00:01:56,680 --> 00:01:58,136
What I want to do is understand

59
00:01:58,140 --> 00:01:59,273
better when it is

60
00:01:59,273 --> 00:02:01,187
exactly that H of

61
00:02:01,187 --> 00:02:02,927
X will be greater or equal

62
00:02:02,927 --> 00:02:04,666
to 0.5, so that

63
00:02:04,666 --> 00:02:09,111
we end up predicting Y is equal to one.

64
00:02:09,530 --> 00:02:11,525
If we look at this plot

65
00:02:11,540 --> 00:02:14,208
of the sigmoid function, we'll notice

66
00:02:14,208 --> 00:02:17,094
that the sigmoid function, G

67
00:02:17,094 --> 00:02:18,981
of Z, is greater than

68
00:02:18,981 --> 00:02:21,019
or equal to 0.5

69
00:02:21,030 --> 00:02:24,296
whenever Z is

70
00:02:24,300 --> 00:02:25,994
greater than or equal to zero.

71
00:02:25,994 --> 00:02:28,163
So is in this half of

72
00:02:28,163 --> 00:02:29,963
the figure that, G takes

73
00:02:29,963 --> 00:02:32,522
on values that are 0.5 and higher.

74
00:02:32,522 --> 00:02:34,482
This is not clear, that's the 0.5.

75
00:02:34,482 --> 00:02:35,957
So when Z is

76
00:02:35,970 --> 00:02:38,352
positive, G of Z,

77
00:02:38,352 --> 00:02:41,959
the sigmoid function, is greater than or equal to 0.5.

78
00:02:41,959 --> 00:02:44,226
Since the hypothesis for

79
00:02:44,226 --> 00:02:46,428
logistic regression is H of

80
00:02:46,428 --> 00:02:48,525
X equals G of theta

81
00:02:48,525 --> 00:02:50,964
transpose X. This is

82
00:02:50,964 --> 00:02:52,163
therefore going to be greater

83
00:02:52,180 --> 00:02:54,338
than or equal to 0.5

84
00:02:54,338 --> 00:02:58,329
whenever theta transpose

85
00:02:58,340 --> 00:03:01,642
X is greater than or equal to zero.

86
00:03:01,642 --> 00:03:03,470
So what was shown, right,

87
00:03:03,470 --> 00:03:05,835
because here theta transpose X

88
00:03:05,835 --> 00:03:08,113
takes the row of Z.

89
00:03:08,120 --> 00:03:09,543
So what we're shown is that

90
00:03:09,543 --> 00:03:11,077
our hypothesis is going

91
00:03:11,077 --> 00:03:13,191
to predict Y equals one

92
00:03:13,200 --> 00:03:15,420
whenever theta transpose X

93
00:03:15,420 --> 00:03:17,924
is greater than or equal to 0.

94
00:03:17,924 --> 00:03:20,016
Let's now consider the other

95
00:03:20,016 --> 00:03:22,380
case of when a hypothesis

96
00:03:22,380 --> 00:03:25,043
will predict Y is equal to 0.

97
00:03:25,043 --> 00:03:27,210
Well, by similar argument, H

98
00:03:27,210 --> 00:03:28,987
of X is going to be

99
00:03:28,987 --> 00:03:30,709
less than 0.5 whenever G

100
00:03:30,730 --> 00:03:32,266
of Z is less than

101
00:03:32,266 --> 00:03:34,711
0.5 because the range

102
00:03:34,720 --> 00:03:36,468
of values of Z that

103
00:03:36,480 --> 00:03:38,013
calls Z to take on

104
00:03:38,020 --> 00:03:42,626
values less that 0.5, well that's when Z is negative.

105
00:03:42,626 --> 00:03:44,916
So when G of Z is less than 0.5.

106
00:03:44,916 --> 00:03:46,874
Our hypothesis will predict

107
00:03:46,874 --> 00:03:48,876
that Y is equal to zero, and

108
00:03:48,876 --> 00:03:50,540
by similar argument to what

109
00:03:50,540 --> 00:03:52,608
we had earlier, H of

110
00:03:52,608 --> 00:03:54,293
X is equal G of

111
00:03:54,320 --> 00:03:56,932
theta transpose X. And

112
00:03:56,932 --> 00:03:58,739
so, we'll predict Y equals

113
00:03:58,739 --> 00:04:01,029
zero whenever this quantity

114
00:04:01,029 --> 00:04:04,937
theta transpose X is less than zero.

115
00:04:04,940 --> 00:04:06,461
To summarize what we just

116
00:04:06,470 --> 00:04:08,377
worked out, we saw that if

117
00:04:08,377 --> 00:04:09,900
we decide to predict whether

118
00:04:09,900 --> 00:04:11,076
Y is equal to one or

119
00:04:11,076 --> 00:04:12,396
Y is equal to zero,

120
00:04:12,400 --> 00:04:14,216
depending on whether the estimated

121
00:04:14,216 --> 00:04:15,807
probability is greater than

122
00:04:15,807 --> 00:04:17,845
or equal 0.5, or whether

123
00:04:17,845 --> 00:04:19,602
it's less than 0.5, then

124
00:04:19,602 --> 00:04:20,935
that's the same as saying that

125
00:04:20,935 --> 00:04:22,920
will predict Y equals 1

126
00:04:22,920 --> 00:04:25,010
whenever theta transpose axis greater

127
00:04:25,010 --> 00:04:26,002
than or equal to 0,

128
00:04:26,002 --> 00:04:27,815
and we will predict Y is

129
00:04:27,815 --> 00:04:30,025
equal to zero whenever theta transpose X

130
00:04:30,025 --> 00:04:32,953
is less than zero.

131
00:04:32,953 --> 00:04:34,192
Let's use this to better

132
00:04:34,192 --> 00:04:36,890
understand how the hypothesis

133
00:04:36,890 --> 00:04:40,029
of logistic regression makes those predictions.

134
00:04:40,040 --> 00:04:41,535
Now, let's suppose we have

135
00:04:41,535 --> 00:04:43,113
a training set like that shown

136
00:04:43,113 --> 00:04:45,165
on the slide, and suppose

137
00:04:45,165 --> 00:04:47,278
our hypothesis is H of

138
00:04:47,278 --> 00:04:48,678
X equals G of theta

139
00:04:48,678 --> 00:04:50,254
zero, plus theta one X1

140
00:04:50,260 --> 00:04:52,854
plus theta two X2.

141
00:04:52,854 --> 00:04:54,516
We haven't talked yet about how

142
00:04:54,516 --> 00:04:56,725
to fit the parameters of this model.

143
00:04:56,725 --> 00:04:59,355
We'll talk about that in the next video.

144
00:04:59,355 --> 00:05:01,770
But suppose that variable procedure

145
00:05:01,770 --> 00:05:03,575
to be specified, we end

146
00:05:03,575 --> 00:05:06,224
up choosing the following values for the parameters.

147
00:05:06,224 --> 00:05:07,861
Let's say we choose theta zero

148
00:05:07,861 --> 00:05:09,750
equals three, theta one

149
00:05:09,750 --> 00:05:13,553
equals one, theta two equals one.

150
00:05:13,553 --> 00:05:15,430
So this means that my parameter

151
00:05:15,430 --> 00:05:17,263
vector is going to be

152
00:05:17,263 --> 00:05:22,963
theta equals minus 311.

153
00:05:24,140 --> 00:05:27,055
So, we're given this

154
00:05:27,060 --> 00:05:30,115
choice of my hypothesis parameters,

155
00:05:30,115 --> 00:05:32,243
let's try to figure out where

156
00:05:32,280 --> 00:05:33,778
a hypothesis will end up

157
00:05:33,778 --> 00:05:35,493
predicting y equals 1 and where it

158
00:05:35,493 --> 00:05:39,055
will end up predicting y equals 0.

159
00:05:39,060 --> 00:05:40,660
Using the formulas that we

160
00:05:40,660 --> 00:05:42,900
worked on the previous slide, we know

161
00:05:42,900 --> 00:05:44,539
that Y equals 1 is

162
00:05:44,539 --> 00:05:45,849
more likely, that is the

163
00:05:45,849 --> 00:05:47,404
probability that Y equals

164
00:05:47,404 --> 00:05:48,943
1 is greater than 0.5

165
00:05:48,950 --> 00:05:51,553
or greater than or equal to 0.5.

166
00:05:51,570 --> 00:05:55,256
Whenever theta transpose x

167
00:05:55,256 --> 00:05:57,211
is greater than zero.

168
00:05:57,230 --> 00:05:58,729
And this formula that I

169
00:05:58,729 --> 00:06:00,846
just underlined minus three

170
00:06:00,850 --> 00:06:03,033
plus X1 plus X2 is,

171
00:06:03,033 --> 00:06:05,216
of course, theta transpose

172
00:06:05,220 --> 00:06:07,014
X when theta is equal

173
00:06:07,014 --> 00:06:09,746
to this value of the parameters

174
00:06:09,760 --> 00:06:12,516
that we just chose.

175
00:06:12,516 --> 00:06:14,640
So, for any example, for

176
00:06:14,640 --> 00:06:16,426
any example with features X1

177
00:06:16,426 --> 00:06:19,300
and X2 that satisfy this

178
00:06:19,300 --> 00:06:21,187
equation that minus 3

179
00:06:21,187 --> 00:06:23,526
plus X1 plus X2

180
00:06:23,530 --> 00:06:24,723
is greater than or equal to 0.

181
00:06:24,723 --> 00:06:27,028
Our hypothesis will think

182
00:06:27,028 --> 00:06:28,066
that Y equals 1 is

183
00:06:28,066 --> 00:06:32,463
more likely, or will predict that Y is equal to one.

184
00:06:32,463 --> 00:06:34,505
We can also take minus three

185
00:06:34,505 --> 00:06:35,752
and bring this to the right

186
00:06:35,760 --> 00:06:37,703
and rewrite this as X1

187
00:06:37,740 --> 00:06:41,435
plus X2 is greater than or equal to three.

188
00:06:41,435 --> 00:06:43,584
And so, equivalently, we found

189
00:06:43,590 --> 00:06:45,826
that this hypothesis will predict

190
00:06:45,826 --> 00:06:47,561
Y equals one whenever X1

191
00:06:47,561 --> 00:06:51,854
plus X2 is greater than or equal to three.

192
00:06:51,870 --> 00:06:54,893
Let's see what that means on the figure.

193
00:06:54,893 --> 00:06:57,209
If I write down the equation,

194
00:06:57,209 --> 00:07:00,217
X1 plus X2 equals three,

195
00:07:00,230 --> 00:07:03,356
this defines the equation of a straight line.

196
00:07:03,360 --> 00:07:05,040
And if I draw what that straight

197
00:07:05,040 --> 00:07:07,695
line looks like, it gives

198
00:07:07,730 --> 00:07:10,116
me the following line which passes

199
00:07:10,116 --> 00:07:11,627
through 3 and 3 on

200
00:07:11,627 --> 00:07:14,946
the X1 and the X2 axis.

201
00:07:15,886 --> 00:07:17,250
So the part of the input space,

202
00:07:17,270 --> 00:07:18,827
the part of the

203
00:07:18,827 --> 00:07:21,553
X1, X2 plane that corresponds

204
00:07:21,553 --> 00:07:24,948
to when X1 plus X2 is greater than or equal to three.

205
00:07:24,948 --> 00:07:27,195
That's going to be this very top plane.

206
00:07:27,210 --> 00:07:29,442
That is everything to the

207
00:07:29,442 --> 00:07:30,701
up, and everything to the upper

208
00:07:30,701 --> 00:07:34,109
right portion of this magenta line that I just drew.

209
00:07:34,109 --> 00:07:35,584
And so, the region where our

210
00:07:35,610 --> 00:07:37,135
hypothesis will predict Y

211
00:07:37,135 --> 00:07:38,324
equals 1 is this

212
00:07:38,330 --> 00:07:40,023
region, you know, is

213
00:07:40,023 --> 00:07:41,586
really this huge region, this

214
00:07:41,620 --> 00:07:44,393
half-space over to the upper right.

215
00:07:44,393 --> 00:07:45,483
And let me just write that down.

216
00:07:45,483 --> 00:07:47,395
I'm gonna call this the Y

217
00:07:47,395 --> 00:07:50,263
equals one region, and in

218
00:07:50,263 --> 00:07:54,293
contrast the region where

219
00:07:54,293 --> 00:07:56,500
X1 plus X2 is

220
00:07:56,510 --> 00:07:58,691
less than three, that's when

221
00:07:58,691 --> 00:08:00,090
we will predict that Y,

222
00:08:00,110 --> 00:08:01,988
Y is equal to zero, and

223
00:08:01,988 --> 00:08:04,679
that corresponds to this region.

224
00:08:04,710 --> 00:08:06,096
You know, itt's really a half-plane, but

225
00:08:06,096 --> 00:08:08,530
that region on the left is

226
00:08:08,530 --> 00:08:11,736
the region where our hypothesis predict Y equals 0.

227
00:08:11,740 --> 00:08:13,431
I want to give

228
00:08:13,431 --> 00:08:16,475
this line, this magenta line that I drew a name.

229
00:08:16,475 --> 00:08:19,458
This line there is called

230
00:08:19,458 --> 00:08:24,648
the decision boundary.

231
00:08:24,648 --> 00:08:27,085
And concretely, this straight line

232
00:08:27,085 --> 00:08:28,468
X1 plus X equals 3.

233
00:08:28,470 --> 00:08:31,170
That corresponds to the set of points.

234
00:08:31,170 --> 00:08:33,334
So that corresponds to the region

235
00:08:33,334 --> 00:08:34,606
where H of X is equal

236
00:08:34,606 --> 00:08:37,000
to 0.5 exactly and

237
00:08:37,000 --> 00:08:38,731
the decision boundary, that is

238
00:08:38,750 --> 00:08:40,696
this straight line, that's the

239
00:08:40,720 --> 00:08:42,772
line that separates the region

240
00:08:42,772 --> 00:08:44,659
where the hypothesis predicts Y equals

241
00:08:44,659 --> 00:08:46,433
one from the region

242
00:08:46,433 --> 00:08:49,773
where the hypothesis predicts that Y is equal to 0.

243
00:08:49,773 --> 00:08:51,387
And just to be clear.

244
00:08:51,390 --> 00:08:53,353
The decision boundary is a

245
00:08:53,353 --> 00:08:57,458
property of the hypothesis

246
00:08:57,458 --> 00:09:00,705
including the parameters theta 0, theta 1, theta 2.

247
00:09:00,720 --> 00:09:03,216
And in the figure I drew a training set.

248
00:09:03,240 --> 00:09:06,455
I drew a data set in order to help the visualization.

249
00:09:06,480 --> 00:09:07,721
But even if we take

250
00:09:07,721 --> 00:09:09,276
away the data set, you know

251
00:09:09,280 --> 00:09:11,076
decision boundary and a

252
00:09:11,076 --> 00:09:12,299
region where we predict Y

253
00:09:12,300 --> 00:09:14,321
equals 1 versus Y equals zero.

254
00:09:14,321 --> 00:09:15,513
That's a property of the

255
00:09:15,513 --> 00:09:16,838
hypothesis and of the

256
00:09:16,838 --> 00:09:18,804
parameters of the hypothesis, and

257
00:09:18,820 --> 00:09:22,163
not a property of the data set.

258
00:09:22,163 --> 00:09:23,606
Later on, of course, we'll talk

259
00:09:23,606 --> 00:09:24,683
about how to fit the

260
00:09:24,683 --> 00:09:26,736
parameters and there we'll

261
00:09:26,736 --> 00:09:28,222
end up using the training set,

262
00:09:28,222 --> 00:09:32,547
or using our data, to determine the value of the parameters.

263
00:09:32,563 --> 00:09:34,550
But once we have particular values

264
00:09:34,550 --> 00:09:37,283
for the parameters: theta 0, theta 1, theta 2.

265
00:09:37,290 --> 00:09:39,645
Then that completely defines

266
00:09:39,645 --> 00:09:41,721
the decision boundary and we

267
00:09:41,721 --> 00:09:43,117
don't actually need to plot

268
00:09:43,117 --> 00:09:44,886
a training set in order

269
00:09:44,886 --> 00:09:48,180
to plot the decision boundary.

270
00:09:49,620 --> 00:09:50,626
Let's now look at a more

271
00:09:50,626 --> 00:09:52,398
complex example where, as

272
00:09:52,420 --> 00:09:54,039
usual, I have crosses to

273
00:09:54,040 --> 00:09:55,932
denote my positive examples and

274
00:09:55,932 --> 00:09:58,926
O's to denote my negative examples.

275
00:09:58,926 --> 00:10:00,696
Given a training set like this,

276
00:10:00,710 --> 00:10:02,873
how can I get logistic regression

277
00:10:02,900 --> 00:10:05,550
to fit this sort of data?

278
00:10:05,550 --> 00:10:07,168
Earlier, when we were talking about

279
00:10:07,168 --> 00:10:09,120
polynomial regression or when

280
00:10:09,120 --> 00:10:10,993
we're linear regression, we talked

281
00:10:10,993 --> 00:10:12,530
about how we can add extra

282
00:10:12,530 --> 00:10:15,561
higher order polynomial terms to the features.

283
00:10:15,561 --> 00:10:18,996
And we can do the same for logistic regression.

284
00:10:18,996 --> 00:10:22,220
Concretely, let's say my hypothesis looks like this.

285
00:10:22,220 --> 00:10:23,718
Where I've added two extra

286
00:10:23,718 --> 00:10:27,691
features, X1 squared and X2 squared, to my features.

287
00:10:27,691 --> 00:10:29,811
So that I now have 5 parameters,

288
00:10:29,811 --> 00:10:32,676
theta 0 through theta 4.

289
00:10:32,676 --> 00:10:34,936
As before, we'll defer to

290
00:10:34,936 --> 00:10:37,398
the next video our discussion

291
00:10:37,420 --> 00:10:39,289
on how to automatically choose

292
00:10:39,289 --> 00:10:42,511
values for the parameters theta 0 through theta 4.

293
00:10:42,511 --> 00:10:44,326
But let's say that

294
00:10:44,326 --> 00:10:46,691
very procedure to be specified,

295
00:10:46,691 --> 00:10:49,243
I end up choosing theta 0

296
00:10:49,243 --> 00:10:51,324
equals minus 1, theta 1

297
00:10:51,324 --> 00:10:52,921
equals 0, theta 2

298
00:10:52,921 --> 00:10:55,664
equals 0, theta 3 equals

299
00:10:55,664 --> 00:10:58,039
1, and theta 4 equals 1.

300
00:10:58,039 --> 00:11:00,223
What this means

301
00:11:00,223 --> 00:11:02,160
is that with this particular choice

302
00:11:02,160 --> 00:11:04,566
of parameters, my parameter

303
00:11:04,566 --> 00:11:09,422
vector theta looks like minus 1, 0, 0, 1, 1.

304
00:11:10,550 --> 00:11:12,356
Following our earlier discussion, this

305
00:11:12,356 --> 00:11:14,439
means that my hypothesis will predict

306
00:11:14,439 --> 00:11:16,407
that Y is equal to 1

307
00:11:16,407 --> 00:11:18,259
whenever minus 1 plus X1

308
00:11:18,259 --> 00:11:21,088
squared plus X2 squared is greater than or equal to 0.

309
00:11:21,088 --> 00:11:24,184
This is whenever theta transpose

310
00:11:24,184 --> 00:11:26,346
times my theta transpose

311
00:11:26,350 --> 00:11:30,030
my features is greater than or equal to 0.

312
00:11:30,060 --> 00:11:31,685
And if I take minus

313
00:11:31,690 --> 00:11:32,950
1 and just bring this to

314
00:11:32,950 --> 00:11:34,810
the right, I'm saying that

315
00:11:34,810 --> 00:11:36,642
my hypothesis will predict that

316
00:11:36,642 --> 00:11:38,100
Y is equal to 1

317
00:11:38,120 --> 00:11:40,710
whenever X1 squared plus

318
00:11:40,710 --> 00:11:43,648
X2 squared is greater than or equal to 1.

319
00:11:43,648 --> 00:11:47,990
So, what does decision boundary look like?

320
00:11:47,990 --> 00:11:49,767
Well, if you were to plot the

321
00:11:49,780 --> 00:11:51,905
curve for X1 squared plus

322
00:11:51,905 --> 00:11:53,665
X2 squared equals 1.

323
00:11:53,665 --> 00:11:55,531
Some of you will

324
00:11:55,531 --> 00:11:58,294
that is the equation for

325
00:11:58,294 --> 00:12:01,296
a circle of radius

326
00:12:01,296 --> 00:12:04,163
1 centered around the origin.

327
00:12:04,163 --> 00:12:08,382
So, that is my decision boundary.

328
00:12:10,410 --> 00:12:12,190
And everything outside the

329
00:12:12,250 --> 00:12:14,207
circle I'm going to predict

330
00:12:14,207 --> 00:12:15,404
as Y equals 1.

331
00:12:15,404 --> 00:12:17,706
So out here is, you know, my

332
00:12:17,706 --> 00:12:19,337
Y equals 1 region.

333
00:12:19,360 --> 00:12:22,693
I'm going to predict Y equals 1 out here.

334
00:12:22,693 --> 00:12:24,294
And inside the circle is where

335
00:12:24,310 --> 00:12:27,786
I'll predict Y is equal to 0.

336
00:12:27,790 --> 00:12:30,060
So, by adding these more

337
00:12:30,060 --> 00:12:33,163
complex or these polynomial terms to my features as well.

338
00:12:33,163 --> 00:12:35,040
I can get more complex decision

339
00:12:35,040 --> 00:12:36,550
boundaries that don't just

340
00:12:36,550 --> 00:12:39,560
try to separate the positive and negative examples of a straight line.

341
00:12:39,560 --> 00:12:41,317
I can get in this example

342
00:12:41,317 --> 00:12:44,258
a decision boundary that's a circle.

343
00:12:44,258 --> 00:12:46,010
Once again the decision boundary

344
00:12:46,010 --> 00:12:47,888
is a property not of

345
00:12:47,888 --> 00:12:51,636
the training set, but of the hypothesis and of the parameters.

346
00:12:51,640 --> 00:12:53,115
So long as we've

347
00:12:53,115 --> 00:12:55,389
given my parameter vector theta,

348
00:12:55,389 --> 00:12:57,185
that defines the decision

349
00:12:57,185 --> 00:12:59,208
boundary which is the circle.

350
00:12:59,210 --> 00:13:03,052
But the training set is not what we use to define decision boundary.

351
00:13:03,052 --> 00:13:06,563
The training set may be used to fit the parameters theta.

352
00:13:06,563 --> 00:13:08,632
We'll talk about how to do that later.

353
00:13:08,632 --> 00:13:09,858
But once you have the

354
00:13:09,858 --> 00:13:13,638
parameters theta, that is what defines the decision boundary.

355
00:13:13,638 --> 00:13:16,388
Let me put back the training set

356
00:13:16,400 --> 00:13:18,587
just for visualization.

357
00:13:18,587 --> 00:13:22,313
And finally, let's look at a more complex example.

358
00:13:22,320 --> 00:13:23,303
So can we come up

359
00:13:23,303 --> 00:13:26,538
with even more complex decision boundaries and this?

360
00:13:26,538 --> 00:13:28,418
If I have even higher

361
00:13:28,420 --> 00:13:31,155
order polynomial terms, so things

362
00:13:31,155 --> 00:13:34,505
like X1 squared, X1

363
00:13:34,505 --> 00:13:36,604
squared X2, X1 squared

364
00:13:36,604 --> 00:13:37,826
X2 squared, and so on.

365
00:13:37,826 --> 00:13:39,001
If I have much higher order

366
00:13:39,001 --> 00:13:41,574
polynomials, then it's possible

367
00:13:41,574 --> 00:13:42,856
to show that you can get

368
00:13:42,856 --> 00:13:45,268
even more complex decision boundaries and

369
00:13:45,268 --> 00:13:46,963
logistic regression can be

370
00:13:46,963 --> 00:13:48,480
used to find the zero boundaries

371
00:13:48,500 --> 00:13:50,093
that may, for example, be

372
00:13:50,093 --> 00:13:52,085
an ellipse like that, or

373
00:13:52,085 --> 00:13:53,503
maybe with a different setting of

374
00:13:53,503 --> 00:13:55,453
the parameters, maybe you

375
00:13:55,453 --> 00:13:57,834
can get instead a different decision boundary that

376
00:13:57,840 --> 00:13:59,776
may even look like, you know, some funny

377
00:13:59,776 --> 00:14:04,145
shape like that.

378
00:14:04,145 --> 00:14:06,423
Or for even more complex examples

379
00:14:06,423 --> 00:14:08,915
you can also get decision boundaries

380
00:14:08,950 --> 00:14:10,381
that can look like, you know,

381
00:14:10,390 --> 00:14:12,045
more complex shapes like that.

382
00:14:12,045 --> 00:14:13,365
Where everything in here you

383
00:14:13,365 --> 00:14:15,453
predict Y equals 1, and

384
00:14:15,453 --> 00:14:17,531
everything outside you predict Y equals 0.

385
00:14:17,531 --> 00:14:19,556
So these higher order polynomial

386
00:14:19,560 --> 00:14:23,060
features you can get very complex decision boundaries.

387
00:14:23,070 --> 00:14:24,786
So with these visualizations, I

388
00:14:24,786 --> 00:14:26,163
hope that gives you a

389
00:14:26,163 --> 00:14:28,623
what's the range of hypothesis

390
00:14:28,623 --> 00:14:30,676
functions we can represent using

391
00:14:30,676 --> 00:14:34,966
the representation that we have for logistic regression.

392
00:14:34,966 --> 00:14:37,713
Now that we know what H of X can represent.

393
00:14:37,713 --> 00:14:39,004
What I'd like to do next in

394
00:14:39,004 --> 00:14:40,560
the following video is talk

395
00:14:40,560 --> 00:14:44,096
about how to automatically choose the parameters theta.

396
00:14:44,110 --> 00:14:45,570
So that given a training

397
00:14:45,570 --> 00:14:49,359
set we can automatically fit the parameters to our data.