1
00:00:00,015 --> 00:00:04,377
[MUSIC]

2
00:00:04,377 --> 00:00:05,576
Okay, so in this course,

3
00:00:05,576 --> 00:00:08,685
we've talked about lots of different
machine learning methods and

4
00:00:08,685 --> 00:00:12,660
lots of applications where these types
of methods can be very impactful.

5
00:00:12,660 --> 00:00:15,560
But of course there a lots of open
challenges that still remain in

6
00:00:15,560 --> 00:00:16,640
machine learning.

7
00:00:16,640 --> 00:00:17,710
So let's discuss some of them.

8
00:00:18,740 --> 00:00:23,080
One is the fact we often have
a choice of which model to use.

9
00:00:23,080 --> 00:00:24,400
So for example,

10
00:00:24,400 --> 00:00:28,370
when we talked about recommending products
we said we can use a classification model.

11
00:00:28,370 --> 00:00:32,740
Where we take features of the user and the
product, pass it through this classifier

12
00:00:32,740 --> 00:00:36,990
to say whether yes or no the person
will like or not like this product.

13
00:00:36,990 --> 00:00:41,020
But then we also talked about using matrix
factorization where we're gonna learn

14
00:00:41,020 --> 00:00:46,380
features about users and products and use
that to recommend products to the users.

15
00:00:46,380 --> 00:00:50,170
And then we also talked about featurized
matrix factorization, combining these two

16
00:00:50,170 --> 00:00:57,580
ideas the list often of possible models
we can consider for a task is very large.

17
00:00:57,580 --> 00:01:01,360
So typically, this leaves
the practitioner very perplexed.

18
00:01:01,360 --> 00:01:04,710
Which model should I use and
searching over this

19
00:01:04,710 --> 00:01:08,889
set of possible choices is still
an open challenge in machine learning.

20
00:01:10,610 --> 00:01:13,980
Another really important challenge
that we're often faced with is

21
00:01:13,980 --> 00:01:15,900
how do we represent our data?

22
00:01:15,900 --> 00:01:18,760
So for example, when we talked
about our document modeling,

23
00:01:18,760 --> 00:01:24,230
our document retrieval task, we said,
well, we could use just raw word counts,

24
00:01:24,230 --> 00:01:28,300
or we also talked about we
could normalize the vectors.

25
00:01:28,300 --> 00:01:32,390
We could use things like
tf-idf to account for

26
00:01:32,390 --> 00:01:37,200
very popular words, and to more emphasize
the important words in the document.

27
00:01:38,430 --> 00:01:41,580
But honestly, there are lots of
different variants of tf-idf,

28
00:01:41,580 --> 00:01:44,900
we just provided one
example of doing this.

29
00:01:44,900 --> 00:01:49,220
You could also think about using BiTrams,
and trigrams, and there are lots and lots

30
00:01:49,220 --> 00:01:53,190
of ways we can think about representing
the words that appear in a document.

31
00:01:53,190 --> 00:01:58,170
That's, our, our data set of interest
that we'd like to represent.

32
00:01:59,380 --> 00:02:01,150
But that's just for a document.

33
00:02:01,150 --> 00:02:03,770
Then maybe we have images.

34
00:02:03,770 --> 00:02:05,100
How do we represent an image?

35
00:02:05,100 --> 00:02:06,290
We've talked about some ways.

36
00:02:06,290 --> 00:02:10,640
We'll talk about others, but
there's lots of challenges with that.

37
00:02:10,640 --> 00:02:15,310
Then maybe you have data
that's really network based,

38
00:02:15,310 --> 00:02:17,820
so things like from Facebook.

39
00:02:17,820 --> 00:02:22,440
So you can have very complicated
data structures and from very,

40
00:02:22,440 --> 00:02:24,462
very different, diverse data sets.

41
00:02:24,462 --> 00:02:27,450
We wanna be able to use the types
of methods we've described.

42
00:02:28,610 --> 00:02:31,350
So, how we represent our data,
of course, is gonna have

43
00:02:31,350 --> 00:02:36,010
significant impact on the types of
inferences that we make on that data.

44
00:02:36,010 --> 00:02:40,750
So, this is a really really important
problem and there's no one method for

45
00:02:40,750 --> 00:02:42,690
choosing the right
representation of your data.

46
00:02:45,690 --> 00:02:49,900
One of the other really important and
really significant challenges we're faced

47
00:02:49,900 --> 00:02:54,660
with in machine learning these days is
how to scale up in multiple dimensions.

48
00:02:54,660 --> 00:02:59,170
So one aspect of this is the fact that
data is getting bigger and bigger.

49
00:02:59,170 --> 00:03:03,440
So this is something that's been
talked about extensively in the media.

50
00:03:03,440 --> 00:03:07,170
So let's just describe a few
situations in which we're faced with

51
00:03:07,170 --> 00:03:09,400
a growing amount of data.

52
00:03:09,400 --> 00:03:13,830
One is the fact that there's just a large
number of different platforms out there

53
00:03:13,830 --> 00:03:18,970
for social networking,
collecting data via crowdsourcing,

54
00:03:18,970 --> 00:03:22,849
and different things like this,
sharing your photos, your videos.

55
00:03:24,120 --> 00:03:29,640
And reviewing restaurants and the list
of possible ways in which you can now

56
00:03:29,640 --> 00:03:35,420
go online and
give data to the world is growing.

57
00:03:35,420 --> 00:03:36,870
And the amount of people doing this and

58
00:03:36,870 --> 00:03:39,500
providing data is just growing
at this huge huge rate.

59
00:03:39,500 --> 00:03:43,159
So we have lots of new data
sources available to us.

60
00:03:43,159 --> 00:03:48,562
And in addition, the way we think about
buying products, we no longer often just

61
00:03:48,562 --> 00:03:53,903
go to a store and have some hand written
record of what, product was purchased.

62
00:03:53,903 --> 00:03:58,926
We now have, vendors like Amazon
who have these huge online

63
00:03:58,926 --> 00:04:04,050
marketplaces and
collect data about different products and

64
00:04:04,050 --> 00:04:08,974
customers and different purchases
being made and lots and

65
00:04:08,974 --> 00:04:13,630
lots of data from different
sources like this.

66
00:04:13,630 --> 00:04:18,110
And beyond these types of
websites there's also a lot

67
00:04:18,110 --> 00:04:20,270
of devices that we can now wear.

68
00:04:20,270 --> 00:04:22,840
So there are these wearable
devices I can now wear

69
00:04:22,840 --> 00:04:27,630
a watch that monitors all the activities
I'm doing, how I'm sleeping at night.

70
00:04:27,630 --> 00:04:30,800
I can wear glasses that record
everything that I'm seeing.

71
00:04:32,590 --> 00:04:36,160
I can also talk about
the Internet of Things,

72
00:04:36,160 --> 00:04:39,960
which is just lots of
connected devices and

73
00:04:39,960 --> 00:04:44,010
lots of different sources of information
communicating with one another.

74
00:04:44,010 --> 00:04:47,850
So these are just some of the areas
in which we're seeing lots and

75
00:04:47,850 --> 00:04:52,420
lots of new data sources, but
of course that's not exhaustive.

76
00:04:52,420 --> 00:04:56,410
We can also talk about
things like medical records.

77
00:04:56,410 --> 00:04:58,840
Again, no longer do you go
into your doctor's office and

78
00:04:58,840 --> 00:05:02,740
just have them write notes by
hand that gets put in some file.

79
00:05:02,740 --> 00:05:05,240
Often, they're taking
electronic health records and

80
00:05:05,240 --> 00:05:08,400
these are now communicating across
systems and we have lots and

81
00:05:08,400 --> 00:05:13,460
lots of electronic health records that,
it's just a source of data

82
00:05:13,460 --> 00:05:19,110
to be parsed and understood and
used to innovate in medicine.

83
00:05:19,110 --> 00:05:22,100
So lots of new datasets,
which is exciting.

84
00:05:22,100 --> 00:05:26,504
We can learn a lot about how people
operate about our bodies, about how people

85
00:05:26,504 --> 00:05:31,042
purchase and make friends and how they go
about their day to day activities but of

86
00:05:31,042 --> 00:05:35,514
course we need to have methods that scale
to analyze these types of data sets and

87
00:05:35,514 --> 00:05:39,985
also to the unique structure of data
that's presented in these data sets, and

88
00:05:39,985 --> 00:05:44,090
the noisy structure, and the list
of challenges is really extensive.

89
00:05:46,070 --> 00:05:48,790
This is one of the very big
challenges in machine learning,

90
00:05:48,790 --> 00:05:50,230
is how to deal with this big data.

91
00:05:51,450 --> 00:05:54,775
And simultaneously,
to data being really large,

92
00:05:54,775 --> 00:05:59,337
we're also faced with the challenge
with the fact that the models that

93
00:05:59,337 --> 00:06:03,539
we're using to analyze these
increasingly complex data sets.

94
00:06:03,539 --> 00:06:08,429
Are also growing, so the models
themselves are becoming bigger and

95
00:06:08,429 --> 00:06:14,115
more complicated in order to extract
information from these ever growingly.

96
00:06:14,115 --> 00:06:15,915
I don't know if that's a word,
but you get my point.

97
00:06:15,915 --> 00:06:20,131
These very intricate data sources and
very large data sources.

98
00:06:20,131 --> 00:06:25,478
So just as an example, when we talked
about clustering we talked about this,

99
00:06:25,478 --> 00:06:27,700
application where, you have,

100
00:06:27,700 --> 00:06:32,554
recordings of brain activity taken
over time, so, this is just one,

101
00:06:32,554 --> 00:06:37,571
quick example of, a model that was
used to analyze this type of data set,

102
00:06:37,571 --> 00:06:42,549
and without going into the details
of what's shown here on this slide.

103
00:06:42,549 --> 00:06:46,133
Just realize that there are lots
of circles and lots of arrows, and

104
00:06:46,133 --> 00:06:49,850
what that means is that this is
a really complicated big, big model.

105
00:06:51,100 --> 00:06:55,380
So you might think, okay, well data's
getting bigger, models are getting bigger,

106
00:06:55,380 --> 00:06:58,680
but that's okay because
processors are getting faster.

107
00:07:00,280 --> 00:07:02,740
Well, that was the story for a while.

108
00:07:02,740 --> 00:07:08,430
We were seeing this exponentially
increasing rate of increased

109
00:07:08,430 --> 00:07:10,550
speed for our processors.

110
00:07:10,550 --> 00:07:12,920
But that stopped about a decade ago.

111
00:07:12,920 --> 00:07:14,650
And now what we're seeing,

112
00:07:14,650 --> 00:07:20,520
is really very marginal increase in
the speed of an individual processor.

113
00:07:20,520 --> 00:07:24,350
So instead, we have to think
about new ways to scale up.

114
00:07:25,610 --> 00:07:28,670
And the typical thing that
we're leveraging these days

115
00:07:28,670 --> 00:07:30,430
is collections of processors.

116
00:07:30,430 --> 00:07:34,770
And there are different architectures,
that we have.

117
00:07:34,770 --> 00:07:40,740
We have things like GPUs and
multicore and clusters and cloud

118
00:07:40,740 --> 00:07:46,180
computing resources and really, really
fancy and expensive super computers.

119
00:07:46,180 --> 00:07:47,420
So that's great.

120
00:07:47,420 --> 00:07:49,070
Those are really, really powerful or

121
00:07:49,070 --> 00:07:52,920
potentially powerful computing
resources that we have.

122
00:07:52,920 --> 00:07:55,530
But a question is how do we
use these in machine learning?

123
00:07:55,530 --> 00:07:58,325
And in machine learning we have a number
of challenges we're faced with.

124
00:07:58,325 --> 00:08:03,840
One is the fact that taking our
machine learning out of those and

125
00:08:03,840 --> 00:08:07,320
thinking about how to distribute them
across these different processors and

126
00:08:07,320 --> 00:08:12,000
run everything we want to run in
a coherent way, is very challenging.

127
00:08:12,000 --> 00:08:16,990
Another challenge is how do we distribute
the data across these different machines

128
00:08:16,990 --> 00:08:20,660
and how do we do all of
this in a way that is

129
00:08:20,660 --> 00:08:25,120
very tolerant to different failures we
can have of the individual machines.

130
00:08:26,760 --> 00:08:31,320
So these represent a number of challenges
that we are facing in machine learning.

131
00:08:31,320 --> 00:08:35,590
And a lot of lot exciting research is
coming out to start addressing these

132
00:08:35,590 --> 00:08:36,277
problems.

133
00:08:36,277 --> 00:08:40,149
[MUSIC]