1
00:00:00,380 --> 00:00:01,550
In this video, we'll talk about

2
00:00:01,670 --> 00:00:02,690
the second major type of machine

3
00:00:03,010 --> 00:00:05,030
learning problem, called Unsupervised Learning.

4
00:00:06,300 --> 00:00:08,500
In the last video, we talked about Supervised Learning.

5
00:00:09,250 --> 00:00:10,700
Back then, recall data sets

6
00:00:11,020 --> 00:00:12,670
that look like this, where each

7
00:00:12,890 --> 00:00:15,150
example was labeled either

8
00:00:15,610 --> 00:00:16,900
as a positive or negative example,

9
00:00:17,530 --> 00:00:19,800
whether it was a benign or a malignant tumor.

10
00:00:20,850 --> 00:00:21,920
So for each example in Supervised

11
00:00:22,410 --> 00:00:24,270
Learning, we were told explicitly what

12
00:00:24,440 --> 00:00:25,760
is the so-called right answer,

13
00:00:26,490 --> 00:00:27,580
whether it's benign or malignant.

14
00:00:28,550 --> 00:00:30,210
In Unsupervised Learning, we're given

15
00:00:30,540 --> 00:00:31,720
data that looks different

16
00:00:31,950 --> 00:00:32,910
than data that looks like

17
00:00:33,190 --> 00:00:34,600
this that doesn't have

18
00:00:34,720 --> 00:00:35,920
any labels or that all

19
00:00:36,130 --> 00:00:37,460
has the same label or really no labels.

20
00:00:39,680 --> 00:00:40,740
So we're given the data set and

21
00:00:40,980 --> 00:00:42,460
we're not told what to

22
00:00:42,560 --> 00:00:43,290
do with it and we're not

23
00:00:43,640 --> 00:00:44,800
told what each data point is.

24
00:00:45,290 --> 00:00:47,190
Instead we're just told, here is a data set.

25
00:00:47,870 --> 00:00:49,650
Can you find some structure in the data?

26
00:00:50,480 --> 00:00:51,670
Given this data set, an

27
00:00:52,350 --> 00:00:53,940
Unsupervised Learning algorithm might decide that

28
00:00:54,060 --> 00:00:56,090
the data lives in two different clusters.

29
00:00:56,800 --> 00:00:57,960
And so there's one cluster

30
00:00:59,120 --> 00:00:59,910
and there's a different cluster.

31
00:01:01,110 --> 00:01:02,710
And yes, Supervised Learning algorithm may

32
00:01:03,040 --> 00:01:05,070
break these data into these two separate clusters.

33
00:01:06,410 --> 00:01:08,000
So this is called a clustering algorithm.

34
00:01:08,860 --> 00:01:10,310
And this turns out to be used in many places.

35
00:01:11,930 --> 00:01:13,310
One example where clustering

36
00:01:13,530 --> 00:01:14,860
is used is in Google

37
00:01:15,060 --> 00:01:16,160
News and if you have not

38
00:01:16,360 --> 00:01:17,320
seen this before, you can actually

39
00:01:18,210 --> 00:01:19,040
go to this URL news.google.com

40
00:01:19,830 --> 00:01:20,460
to take a look.

41
00:01:21,280 --> 00:01:22,970
What Google News does is everyday

42
00:01:23,480 --> 00:01:24,220
it goes and looks at tens

43
00:01:24,470 --> 00:01:25,430
of thousands or hundreds of

44
00:01:25,720 --> 00:01:26,740
thousands of new stories on the

45
00:01:26,800 --> 00:01:29,410
web and it groups them into cohesive news stories.

46
00:01:30,730 --> 00:01:31,690
For example, let's look here.

47
00:01:33,380 --> 00:01:35,370
The URLs here link

48
00:01:35,910 --> 00:01:37,260
to different news stories

49
00:01:38,010 --> 00:01:40,110
about the BP Oil Well story.

50
00:01:41,300 --> 00:01:42,160
So, let's click on

51
00:01:42,260 --> 00:01:43,090
one of these URL's and we'll

52
00:01:43,550 --> 00:01:44,780
click on one of these URL's.

53
00:01:45,100 --> 00:01:46,970
What I'll get to is a web page like this.

54
00:01:47,210 --> 00:01:48,390
Here's a Wall Street

55
00:01:48,590 --> 00:01:50,180
Journal article about, you know, the BP

56
00:01:51,110 --> 00:01:52,530
Oil Well Spill stories of

57
00:01:52,920 --> 00:01:54,350
"BP Kills Macondo",

58
00:01:54,590 --> 00:01:55,700
which is a name of the

59
00:01:55,980 --> 00:01:57,960
spill and if you

60
00:01:58,020 --> 00:01:59,360
click on a different URL

61
00:02:00,690 --> 00:02:02,500
from that group then you might get the different story.

62
00:02:02,950 --> 00:02:04,760
Here's the CNN story about a

63
00:02:04,820 --> 00:02:06,090
game, the BP Oil Spill,

64
00:02:07,090 --> 00:02:08,180
and if you click on yet

65
00:02:08,740 --> 00:02:10,990
a third link, then you might get a different story.

66
00:02:11,440 --> 00:02:13,380
Here's the UK Guardian story

67
00:02:13,940 --> 00:02:15,510
about the BP Oil Spill.

68
00:02:16,530 --> 00:02:17,790
So what Google News has done

69
00:02:17,990 --> 00:02:19,440
is look for tens of thousands of

70
00:02:19,490 --> 00:02:22,170
news stories and automatically cluster them together.

71
00:02:23,030 --> 00:02:24,660
So, the news stories that are all

72
00:02:25,080 --> 00:02:27,010
about the same topic get displayed together.

73
00:02:27,210 --> 00:02:29,170
It turns out that

74
00:02:29,380 --> 00:02:31,020
clustering algorithms and Unsupervised Learning

75
00:02:31,530 --> 00:02:33,550
algorithms are used in many other problems as well.

76
00:02:35,320 --> 00:02:36,690
Here's one on understanding genomics.

77
00:02:38,270 --> 00:02:40,510
Here's an example of DNA microarray data.

78
00:02:40,990 --> 00:02:42,230
The idea is put

79
00:02:42,430 --> 00:02:44,360
a group of different individuals and

80
00:02:44,510 --> 00:02:45,590
for each of them, you measure

81
00:02:46,100 --> 00:02:48,580
how much they do or do not have a certain gene.

82
00:02:49,050 --> 00:02:51,640
Technically you measure how much certain genes are expressed.

83
00:02:52,000 --> 00:02:54,190
So these colors, red, green,

84
00:02:54,930 --> 00:02:56,210
gray and so on, they

85
00:02:56,340 --> 00:02:57,500
show the degree to which

86
00:02:57,780 --> 00:02:59,440
different individuals do or

87
00:02:59,510 --> 00:03:01,270
do not have a specific gene.

88
00:03:02,500 --> 00:03:03,400
And what you can do is then

89
00:03:03,610 --> 00:03:05,070
run a clustering algorithm to group

90
00:03:05,380 --> 00:03:07,140
individuals into different categories

91
00:03:07,780 --> 00:03:08,810
or into different types of people.

92
00:03:10,230 --> 00:03:11,660
So this is Unsupervised Learning because

93
00:03:11,930 --> 00:03:14,010
we're not telling the algorithm in advance

94
00:03:14,590 --> 00:03:15,690
that these are type 1 people,

95
00:03:16,130 --> 00:03:17,420
those are type 2 persons, those

96
00:03:17,560 --> 00:03:18,650
are type 3 persons and so

97
00:03:19,610 --> 00:03:22,390
on and instead what were saying is yeah here's a bunch of data.

98
00:03:23,110 --> 00:03:24,030
I don't know what's in this data.

99
00:03:24,750 --> 00:03:25,870
I don't know who's and what type.

100
00:03:26,150 --> 00:03:26,940
I don't even know what the different

101
00:03:27,260 --> 00:03:28,480
types of people are, but can

102
00:03:28,610 --> 00:03:30,210
you automatically find structure in

103
00:03:30,360 --> 00:03:31,260
the data from the you automatically

104
00:03:32,180 --> 00:03:33,620
cluster the individuals into these types

105
00:03:33,870 --> 00:03:35,490
that I don't know in advance?

106
00:03:35,890 --> 00:03:37,610
Because we're not giving the algorithm

107
00:03:38,160 --> 00:03:40,140
the right answer for the

108
00:03:40,370 --> 00:03:41,270
examples in my data

109
00:03:41,590 --> 00:03:43,090
set, this is Unsupervised Learning.

110
00:03:44,290 --> 00:03:47,040
Unsupervised Learning or clustering is used for a bunch of other applications.

111
00:03:48,340 --> 00:03:50,340
It's used to organize large computer clusters.

112
00:03:51,390 --> 00:03:52,530
I had some friends looking at

113
00:03:52,680 --> 00:03:53,970
large data centers, that is

114
00:03:54,180 --> 00:03:55,970
large computer clusters and trying

115
00:03:56,230 --> 00:03:57,470
to figure out which machines tend to

116
00:03:57,590 --> 00:03:59,130
work together and if

117
00:03:59,200 --> 00:04:00,270
you can put those machines together,

118
00:04:01,100 --> 00:04:03,220
you can make your data center work more efficiently.

119
00:04:04,810 --> 00:04:06,820
This second application is on social network analysis.

120
00:04:07,890 --> 00:04:09,230
So given knowledge about which friends

121
00:04:09,630 --> 00:04:10,840
you email the most or

122
00:04:10,880 --> 00:04:12,150
given your Facebook friends or

123
00:04:12,180 --> 00:04:14,150
your Google+ circles, can

124
00:04:14,290 --> 00:04:16,380
we automatically identify which are

125
00:04:16,450 --> 00:04:17,950
cohesive groups of friends,

126
00:04:18,460 --> 00:04:19,420
also which are groups of people

127
00:04:20,230 --> 00:04:21,010
that all know each other?

128
00:04:22,540 --> 00:04:22,880
Market segmentation.

129
00:04:24,680 --> 00:04:26,780
Many companies have huge databases of customer information.

130
00:04:27,700 --> 00:04:28,410
So, can you look at this

131
00:04:28,510 --> 00:04:30,000
customer data set and automatically

132
00:04:30,740 --> 00:04:32,340
discover market segments and automatically

133
00:04:33,340 --> 00:04:35,290
group your customers into different

134
00:04:35,820 --> 00:04:37,400
market segments so that

135
00:04:37,710 --> 00:04:39,490
you can automatically and more

136
00:04:39,650 --> 00:04:41,580
efficiently sell or market

137
00:04:41,890 --> 00:04:43,250
your different market segments together?

138
00:04:44,260 --> 00:04:45,580
Again, this is Unsupervised Learning

139
00:04:45,820 --> 00:04:46,720
because we have all this

140
00:04:46,900 --> 00:04:48,340
customer data, but we don't

141
00:04:48,590 --> 00:04:49,710
know in advance what are the

142
00:04:49,790 --> 00:04:51,270
market segments and for

143
00:04:51,440 --> 00:04:52,570
the customers in our data

144
00:04:52,660 --> 00:04:53,590
set, you know, we don't know in

145
00:04:53,690 --> 00:04:54,700
advance who is in

146
00:04:54,800 --> 00:04:55,840
market segment one, who is

147
00:04:55,940 --> 00:04:57,800
in market segment two, and so on.

148
00:04:57,930 --> 00:05:00,630
But we have to let the algorithm discover all this just from the data.

149
00:05:01,970 --> 00:05:03,140
Finally, it turns out that Unsupervised

150
00:05:03,690 --> 00:05:05,620
Learning is also used for

151
00:05:06,090 --> 00:05:08,060
surprisingly astronomical data analysis

152
00:05:08,890 --> 00:05:10,390
and these clustering algorithms gives

153
00:05:10,580 --> 00:05:12,440
surprisingly interesting useful theories

154
00:05:12,900 --> 00:05:15,610
of how galaxies are born.

155
00:05:15,880 --> 00:05:17,620
All of these are examples of clustering,

156
00:05:18,400 --> 00:05:20,550
which is just one type of Unsupervised Learning.

157
00:05:21,530 --> 00:05:22,470
Let me tell you about another one.

158
00:05:23,200 --> 00:05:25,020
I'm gonna tell you about the cocktail party problem.

159
00:05:26,310 --> 00:05:28,270
So, you've been to cocktail parties before, right?

160
00:05:28,440 --> 00:05:30,080
Well, you can imagine there's a

161
00:05:30,300 --> 00:05:31,690
party, room full of people, all

162
00:05:31,870 --> 00:05:32,930
sitting around, all talking at the

163
00:05:32,970 --> 00:05:34,390
same time and there are

164
00:05:34,480 --> 00:05:36,230
all these overlapping voices because everyone

165
00:05:36,590 --> 00:05:37,920
is talking at the same time, and

166
00:05:38,070 --> 00:05:39,730
it is almost hard to hear the person in front of you.

167
00:05:40,690 --> 00:05:41,970
So maybe at a

168
00:05:42,020 --> 00:05:43,990
cocktail party with two people,

169
00:05:45,690 --> 00:05:46,670
two people talking at the same

170
00:05:46,770 --> 00:05:48,090
time, and it's a somewhat

171
00:05:48,740 --> 00:05:49,710
small cocktail party.

172
00:05:50,690 --> 00:05:51,630
And we're going to put two

173
00:05:51,890 --> 00:05:53,080
microphones in the room so

174
00:05:54,060 --> 00:05:55,640
there are microphones, and because

175
00:05:56,050 --> 00:05:57,430
these microphones are at two

176
00:05:57,560 --> 00:05:58,900
different distances from the

177
00:05:58,990 --> 00:06:01,250
speakers, each microphone records

178
00:06:01,830 --> 00:06:04,720
a different combination of these two speaker voices.

179
00:06:05,810 --> 00:06:06,970
Maybe speaker one is a

180
00:06:07,120 --> 00:06:08,320
little louder in microphone one

181
00:06:09,120 --> 00:06:10,680
and maybe speaker two is a

182
00:06:10,800 --> 00:06:12,350
little bit louder on microphone 2

183
00:06:12,560 --> 00:06:14,040
because the 2 microphones are

184
00:06:14,230 --> 00:06:15,950
at different positions relative to

185
00:06:16,400 --> 00:06:19,020
the 2 speakers, but each

186
00:06:19,250 --> 00:06:20,390
microphone would cause an overlapping

187
00:06:20,970 --> 00:06:22,590
combination of both speakers' voices.

188
00:06:23,960 --> 00:06:25,500
So here's an actual recording

189
00:06:26,520 --> 00:06:29,280
of two speakers recorded by a researcher.

190
00:06:29,740 --> 00:06:30,950
Let me play for you the

191
00:06:31,060 --> 00:06:32,760
first, what the first microphone sounds like.

192
00:06:33,560 --> 00:06:34,800
One (uno), two (dos),

193
00:06:35,070 --> 00:06:36,590
three (tres), four (cuatro), five

194
00:06:37,060 --> 00:06:38,550
(cinco), six (seis), seven (siete),

195
00:06:38,990 --> 00:06:40,610
eight (ocho), nine (nueve), ten (y diez).

196
00:06:41,610 --> 00:06:42,650
All right, maybe not the most interesting cocktail

197
00:06:43,000 --> 00:06:44,270
party, there's two people

198
00:06:44,620 --> 00:06:45,670
counting from one to ten

199
00:06:46,010 --> 00:06:47,880
in two languages but you know.

200
00:06:48,870 --> 00:06:49,760
What you just heard was the

201
00:06:49,820 --> 00:06:52,500
first microphone recording, here's the second recording.

202
00:06:57,440 --> 00:06:58,040
Uno (one), dos (two), tres (three), cuatro

203
00:06:58,060 --> 00:06:58,730
(four), cinco (five), seis (six), siete (seven),

204
00:06:59,160 --> 00:07:00,900
ocho (eight), nueve (nine) y diez (ten).

205
00:07:01,860 --> 00:07:02,850
So we can do, is take

206
00:07:03,380 --> 00:07:04,660
these two microphone recorders and give

207
00:07:04,980 --> 00:07:06,480
them to an Unsupervised Learning algorithm

208
00:07:07,010 --> 00:07:08,560
called the cocktail party algorithm,

209
00:07:08,780 --> 00:07:09,910
and tell the algorithm

210
00:07:10,450 --> 00:07:12,140
- find structure in this data for you.

211
00:07:12,250 --> 00:07:14,010
And what the algorithm will do

212
00:07:14,410 --> 00:07:15,730
is listen to these

213
00:07:15,980 --> 00:07:17,990
audio recordings and say, you

214
00:07:18,140 --> 00:07:19,020
know it sounds like the

215
00:07:19,360 --> 00:07:20,950
two audio recordings are being

216
00:07:21,240 --> 00:07:22,450
added together or that have being

217
00:07:22,670 --> 00:07:25,220
summed together to produce these recordings that we had.

218
00:07:25,990 --> 00:07:27,330
Moreover, what the cocktail party

219
00:07:27,710 --> 00:07:29,210
algorithm will do is separate

220
00:07:29,570 --> 00:07:30,810
out these two audio sources

221
00:07:31,480 --> 00:07:32,700
that were being added or being

222
00:07:33,000 --> 00:07:34,240
summed together to form other

223
00:07:34,410 --> 00:07:35,600
recordings and, in fact,

224
00:07:36,200 --> 00:07:38,630
here's the first output of the cocktail party algorithm.

225
00:07:39,790 --> 00:07:41,910
One, two, three, four,

226
00:07:42,590 --> 00:07:46,270
five, six, seven, eight, nine, ten.

227
00:07:47,630 --> 00:07:48,780
So, I separated out the English

228
00:07:49,240 --> 00:07:51,220
voice in one of the recordings.

229
00:07:52,460 --> 00:07:53,300
And here's the second of it.

230
00:07:53,380 --> 00:07:55,280
Uno, dos, tres, quatro, cinco,

231
00:07:55,980 --> 00:07:59,830
seis, siete, ocho, nueve y diez.

232
00:08:00,270 --> 00:08:01,180
Not too bad, to give you

233
00:08:03,810 --> 00:08:05,270
one more example, here's another

234
00:08:05,600 --> 00:08:07,370
recording of another similar situation,

235
00:08:08,060 --> 00:08:09,790
here's the first microphone :  One,

236
00:08:10,470 --> 00:08:12,430
two, three, four, five, six,

237
00:08:13,370 --> 00:08:15,710
seven, eight, nine, ten.

238
00:08:16,980 --> 00:08:17,920
OK so the poor guy's gone

239
00:08:18,180 --> 00:08:19,350
home from the cocktail party and

240
00:08:19,420 --> 00:08:21,880
he 's now sitting in a room by himself talking to his radio.

241
00:08:23,090 --> 00:08:24,130
Here's the second microphone recording.

242
00:08:28,810 --> 00:08:31,800
One, two, three, four, five, six, seven, eight, nine, ten.

243
00:08:33,310 --> 00:08:34,160
When you give these two microphone

244
00:08:34,610 --> 00:08:35,530
recordings to the same algorithm,

245
00:08:36,360 --> 00:08:37,790
what it does, is again say,

246
00:08:38,380 --> 00:08:39,470
you know, it sounds like there

247
00:08:39,690 --> 00:08:41,370
are two audio sources, and moreover,

248
00:08:42,410 --> 00:08:43,820
the album says, here is

249
00:08:44,070 --> 00:08:46,010
the first of the audio sources I found.

250
00:08:47,480 --> 00:08:49,300
One, two, three, four,

251
00:08:49,730 --> 00:08:53,430
five, six, seven, eight, nine, ten.

252
00:08:54,650 --> 00:08:56,110
So that wasn't perfect, it

253
00:08:56,340 --> 00:08:57,360
got the voice, but it

254
00:08:57,570 --> 00:08:59,070
also got a little bit of the music in there.

255
00:08:59,890 --> 00:09:01,360
Then here's the second output to the algorithm.

256
00:09:10,020 --> 00:09:11,310
Not too bad, in that second

257
00:09:11,540 --> 00:09:13,300
output it managed to get rid of the voice entirely.

258
00:09:13,760 --> 00:09:14,850
And just, you know,

259
00:09:15,020 --> 00:09:17,380
cleaned up the music, got rid of the counting from one to ten.

260
00:09:18,840 --> 00:09:20,090
So you might look at

261
00:09:20,180 --> 00:09:21,750
an Unsupervised Learning algorithm like

262
00:09:21,950 --> 00:09:23,050
this and ask how

263
00:09:23,250 --> 00:09:25,110
complicated this is to implement this, right?

264
00:09:25,330 --> 00:09:26,560
It seems like in order to,

265
00:09:26,970 --> 00:09:28,870
you know, build this application, it seems

266
00:09:28,930 --> 00:09:30,550
like to do this audio processing you

267
00:09:30,670 --> 00:09:31,430
need to write a ton of code

268
00:09:32,240 --> 00:09:33,580
or maybe link into like a

269
00:09:33,690 --> 00:09:35,380
bunch of synthesizer Java libraries that

270
00:09:35,470 --> 00:09:37,150
process audio, seems like

271
00:09:37,240 --> 00:09:38,880
a really complicated program, to do

272
00:09:39,060 --> 00:09:41,040
this audio, separating out audio and so on.

273
00:09:42,460 --> 00:09:43,860
It turns out the algorithm, to

274
00:09:44,070 --> 00:09:45,640
do what you just heard, that

275
00:09:45,900 --> 00:09:47,280
can be done with one line

276
00:09:47,530 --> 00:09:49,270
of code - shown right here.

277
00:09:50,640 --> 00:09:52,350
It take researchers a long

278
00:09:52,610 --> 00:09:54,060
time to come up with this line of code.

279
00:09:54,490 --> 00:09:56,090
I'm not saying this is an easy problem,

280
00:09:57,080 --> 00:09:57,980
But it turns out that when you

281
00:09:58,180 --> 00:10:00,330
use the right programming environment, many learning

282
00:10:00,670 --> 00:10:02,060
algorithms can be really short programs.

283
00:10:03,510 --> 00:10:04,700
So this is also why in

284
00:10:04,840 --> 00:10:05,890
this class we're going to

285
00:10:06,010 --> 00:10:07,430
use the Octave programming environment.

286
00:10:08,550 --> 00:10:09,910
Octave, is free open source

287
00:10:10,120 --> 00:10:11,620
software, and using a

288
00:10:11,670 --> 00:10:13,130
tool like Octave or Matlab,

289
00:10:14,000 --> 00:10:15,400
many learning algorithms become just

290
00:10:15,690 --> 00:10:17,910
a few lines of code to implement.

291
00:10:18,380 --> 00:10:19,400
Later in this class, I'll just teach

292
00:10:19,620 --> 00:10:20,570
you a little bit about how to

293
00:10:20,720 --> 00:10:21,920
use Octave and you'll be

294
00:10:22,050 --> 00:10:24,590
implementing some of these algorithms in Octave.

295
00:10:24,980 --> 00:10:26,050
Or if you have Matlab you can use that too.

296
00:10:27,120 --> 00:10:28,500
It turns out the Silicon Valley, for

297
00:10:28,620 --> 00:10:29,470
a lot of machine learning algorithms,

298
00:10:30,290 --> 00:10:31,310
what we do is first prototype

299
00:10:32,040 --> 00:10:33,900
our software in Octave because software

300
00:10:34,330 --> 00:10:35,250
in Octave makes it incredibly fast

301
00:10:35,540 --> 00:10:36,920
to implement these learning algorithms.

302
00:10:38,230 --> 00:10:39,110
Here each of these functions

303
00:10:39,720 --> 00:10:41,460
like for example the SVD

304
00:10:41,680 --> 00:10:42,920
function that stands for singular

305
00:10:43,240 --> 00:10:44,520
value decomposition; but that turns

306
00:10:44,640 --> 00:10:45,690
out to be a

307
00:10:45,820 --> 00:10:48,420
linear algebra routine, that is just built into Octave.

308
00:10:49,500 --> 00:10:50,390
If you were trying to do this

309
00:10:50,460 --> 00:10:51,490
in C++ or Java,

310
00:10:51,780 --> 00:10:53,040
this would be many many lines of

311
00:10:53,180 --> 00:10:55,680
code linking complex C++ or Java libraries.

312
00:10:56,440 --> 00:10:57,490
So, you can implement this stuff as

313
00:10:57,680 --> 00:10:58,690
C++ or Java

314
00:10:59,050 --> 00:11:00,090
or Python, it's just much

315
00:11:00,290 --> 00:11:02,090
more complicated to do so in those languages.

316
00:11:03,750 --> 00:11:05,060
What I've seen after having taught

317
00:11:05,300 --> 00:11:06,980
machine learning for almost a

318
00:11:07,210 --> 00:11:08,680
decade now, is that, you

319
00:11:08,890 --> 00:11:10,340
learn much faster if you

320
00:11:10,480 --> 00:11:11,700
use Octave as your

321
00:11:11,790 --> 00:11:14,070
programming environment, and if

322
00:11:14,250 --> 00:11:15,570
you use Octave as your

323
00:11:16,260 --> 00:11:17,110
learning tool and as your

324
00:11:17,240 --> 00:11:18,640
prototyping tool, it'll let

325
00:11:19,000 --> 00:11:21,280
you learn and prototype learning algorithms much more quickly.

326
00:11:22,640 --> 00:11:23,850
And in fact what many people will

327
00:11:23,990 --> 00:11:25,390
do to in the large Silicon

328
00:11:25,730 --> 00:11:27,360
Valley companies is in fact, use

329
00:11:27,560 --> 00:11:29,020
an algorithm like Octave to first

330
00:11:29,370 --> 00:11:31,110
prototype the learning algorithm, and

331
00:11:31,510 --> 00:11:32,780
only after you've gotten it

332
00:11:32,860 --> 00:11:33,820
to work, then you migrate

333
00:11:34,390 --> 00:11:35,910
it to C++ or Java or whatever.

334
00:11:36,890 --> 00:11:37,960
It turns out that by doing

335
00:11:38,220 --> 00:11:39,070
things this way, you can often

336
00:11:39,400 --> 00:11:40,440
get your algorithm to work much

337
00:11:41,300 --> 00:11:43,050
faster than if you were starting out in C++.

338
00:11:44,440 --> 00:11:46,010
So, I know that as an

339
00:11:46,100 --> 00:11:47,490
instructor, I get to

340
00:11:47,570 --> 00:11:48,580
say "trust me on

341
00:11:48,730 --> 00:11:49,790
this one" only a finite

342
00:11:50,030 --> 00:11:51,420
number of times, but for

343
00:11:51,560 --> 00:11:52,720
those of you who've never used these

344
00:11:53,330 --> 00:11:54,880
Octave type programming environments before,

345
00:11:55,240 --> 00:11:56,070
I am going to ask you

346
00:11:56,130 --> 00:11:56,970
to trust me on this one,

347
00:11:57,570 --> 00:11:58,950
and say that you, you will,

348
00:11:59,700 --> 00:12:01,180
I think your time, your development

349
00:12:01,700 --> 00:12:03,100
time is one of the most valuable resources.

350
00:12:04,210 --> 00:12:05,570
And having seen lots

351
00:12:05,800 --> 00:12:06,850
of people do this, I think

352
00:12:07,190 --> 00:12:08,460
you as a machine learning

353
00:12:08,850 --> 00:12:09,990
researcher, or machine learning developer

354
00:12:10,830 --> 00:12:12,080
will be much more productive if

355
00:12:12,220 --> 00:12:13,010
you learn to start in prototype,

356
00:12:13,580 --> 00:12:15,250
to start in Octave, in some other language.

357
00:12:17,570 --> 00:12:19,790
Finally, to wrap

358
00:12:20,090 --> 00:12:22,890
up this video, I have one quick review question for you.

359
00:12:24,400 --> 00:12:26,400
We talked about Unsupervised Learning, which

360
00:12:26,700 --> 00:12:27,670
is a learning setting where you

361
00:12:27,760 --> 00:12:28,730
give the algorithm a ton

362
00:12:28,840 --> 00:12:30,120
of data and just ask it

363
00:12:30,240 --> 00:12:32,900
to find structure in the data for us.

364
00:12:33,160 --> 00:12:35,170
Of the following four examples, which

365
00:12:35,490 --> 00:12:36,410
ones, which of these four

366
00:12:36,870 --> 00:12:37,630
do you think would will be

367
00:12:37,720 --> 00:12:39,520
an Unsupervised Learning algorithm as

368
00:12:40,220 --> 00:12:41,950
opposed to Supervised Learning problem.

369
00:12:42,730 --> 00:12:43,590
For each of the four

370
00:12:43,860 --> 00:12:44,850
check boxes on the left,

371
00:12:45,640 --> 00:12:46,900
check the ones for which

372
00:12:47,210 --> 00:12:49,400
you think Unsupervised Learning

373
00:12:49,700 --> 00:12:51,300
algorithm would be appropriate and

374
00:12:51,440 --> 00:12:53,930
then click the button on the lower right to check your answer.

375
00:12:54,690 --> 00:12:57,030
So when the video pauses, please

376
00:12:57,370 --> 00:12:58,750
answer the question on the slide.

377
00:13:01,860 --> 00:13:03,950
So, hopefully, you've remembered the spam folder problem.

378
00:13:04,710 --> 00:13:06,310
If you have labeled data, you

379
00:13:06,450 --> 00:13:07,680
know, with spam and

380
00:13:07,800 --> 00:13:10,470
non-spam e-mail, we'd treat this as a Supervised Learning problem.

381
00:13:11,620 --> 00:13:13,870
The news story example, that's

382
00:13:14,100 --> 00:13:15,370
exactly the Google News example

383
00:13:15,910 --> 00:13:16,600
that we saw in this video,

384
00:13:17,090 --> 00:13:17,950
we saw how you can use

385
00:13:18,080 --> 00:13:19,460
a clustering algorithm to cluster

386
00:13:19,880 --> 00:13:21,980
these articles together so that's Unsupervised Learning.

387
00:13:23,250 --> 00:13:25,440
The market segmentation example I

388
00:13:25,510 --> 00:13:27,120
talked a little bit earlier, you

389
00:13:27,220 --> 00:13:29,110
can do that as an Unsupervised Learning problem

390
00:13:29,970 --> 00:13:30,860
because I am just gonna

391
00:13:30,930 --> 00:13:32,340
get my algorithm data and ask

392
00:13:32,500 --> 00:13:34,340
it to discover market segments automatically.

393
00:13:35,610 --> 00:13:37,930
And the final example, diabetes, well,

394
00:13:38,070 --> 00:13:39,080
that's actually just like our

395
00:13:39,350 --> 00:13:41,480
breast cancer example from the last video.

396
00:13:42,190 --> 00:13:43,320
Only instead of, you know,

397
00:13:43,600 --> 00:13:45,280
good and bad cancer tumors or

398
00:13:45,550 --> 00:13:47,390
benign or malignant tumors we

399
00:13:47,550 --> 00:13:49,270
instead have diabetes or

400
00:13:49,330 --> 00:13:50,440
not and so we will

401
00:13:50,700 --> 00:13:51,830
use that as a supervised,

402
00:13:52,370 --> 00:13:53,740
we will solve that as

403
00:13:53,870 --> 00:13:54,670
a Supervised Learning problem just like

404
00:13:54,730 --> 00:13:56,450
we did for the breast tumor data.

405
00:13:58,270 --> 00:13:59,400
So, that's it for Unsupervised

406
00:14:00,100 --> 00:14:01,580
Learning and in the

407
00:14:01,650 --> 00:14:02,940
next video, we'll delve more

408
00:14:03,270 --> 00:14:04,600
into specific learning algorithms

409
00:14:05,550 --> 00:14:06,590
and start to talk about

410
00:14:07,220 --> 00:14:08,750
just how these algorithms work and

411
00:14:08,920 --> 00:14:11,270
how we can, how you can go about implementing them.