1
00:00:02,530 --> 00:00:05,825
Greetings! Welcome back to the deep learning course.

2
00:00:05,825 --> 00:00:08,220
It's been almost a month since we began and so far you've made

3
00:00:08,220 --> 00:00:10,778
great progress in the field of artificial neural networks for general data.

4
00:00:10,778 --> 00:00:15,100
And particularly, you've studied a lot of advanced architectures for vision.

5
00:00:15,100 --> 00:00:18,025
You've trained your crucial neural networks to classify images.

6
00:00:18,025 --> 00:00:21,655
You also studied how to perform more advanced stuff based on the images.

7
00:00:21,655 --> 00:00:24,780
However all those problems lie in the domain of supervised learning.

8
00:00:24,780 --> 00:00:27,850
It's where you have some kind of source data,

9
00:00:27,850 --> 00:00:30,675
some features, and you have reference answers for those features,

10
00:00:30,675 --> 00:00:35,345
whether they are classes like cats or dog or some continuous value like, say,

11
00:00:35,345 --> 00:00:40,367
if you give it a person and it predicts the salary it gets hourly,

12
00:00:40,367 --> 00:00:42,295
he or she gets hourly, sorry.

13
00:00:42,295 --> 00:00:45,645
Unsupervised learning, that we're going to now; however,

14
00:00:45,645 --> 00:00:49,650
has no access to this target variable and essentially has a more general,

15
00:00:49,650 --> 00:00:54,130
less defined task of kind of explaining the data, finding the instruction.

16
00:00:54,130 --> 00:00:57,700
So if supervised learning finds you a boundary between cats and dogs,

17
00:00:57,700 --> 00:01:02,734
unsupervised learning takes unlabeled pictures of cats and dogs being sent to other,

18
00:01:02,734 --> 00:01:07,260
well Kaggle and tries to find whether there are some maybe clusters or

19
00:01:07,260 --> 00:01:12,109
some groups that fit the definition of one object in this data.

20
00:01:12,109 --> 00:01:14,430
It tries to maybe compress the data.

21
00:01:14,430 --> 00:01:17,745
It tries to find some good feature representation that is useful for other problems.

22
00:01:17,745 --> 00:01:20,820
Of course, finding some hidden section in the data that is well defined as

23
00:01:20,820 --> 00:01:24,574
a mathematical problem as it would be for example for image classification,

24
00:01:24,574 --> 00:01:26,045
not running any mapping.

25
00:01:26,045 --> 00:01:30,124
Well, in fact you do, but it's not clear which mapping is better and which is worse.

26
00:01:30,124 --> 00:01:33,090
So there's no differentiation criterion and there are

27
00:01:33,090 --> 00:01:35,490
many unsupervised methods that cover one

28
00:01:35,490 --> 00:01:38,725
or the other approach to what's better and what's worse.

29
00:01:38,725 --> 00:01:43,155
Basically, the unsupervised leaning methods are better studied in the way that

30
00:01:43,155 --> 00:01:45,120
they help you solve problems and each problem is

31
00:01:45,120 --> 00:01:47,775
solved with a particular branch of unsupervised learning methods.

32
00:01:47,775 --> 00:01:49,685
Today, we're going to study a lot of them,

33
00:01:49,685 --> 00:01:52,410
but let's quickly recap what problems can

34
00:01:52,410 --> 00:01:57,610
be solved by this kind of hidden structure finding.

35
00:01:57,610 --> 00:02:00,475
For example, imagine you're working with raw data on Kaggle,

36
00:02:00,475 --> 00:02:04,205
so there is yet another competition that makes you work with images whether

37
00:02:04,205 --> 00:02:08,780
cats dogs or satellite images or maybe medical scans or maybe sound data.

38
00:02:08,780 --> 00:02:13,120
And you're usually given this data in a form of if it's an image or

39
00:02:13,120 --> 00:02:18,060
pixels or maybe frequency series if it's sound data.

40
00:02:18,060 --> 00:02:23,055
What you actually want to do with this data is to make some kind of high level decision,

41
00:02:23,055 --> 00:02:24,345
say, if it is an image,

42
00:02:24,345 --> 00:02:27,795
you may want to classify it or maybe detect objects on it.

43
00:02:27,795 --> 00:02:31,080
If it's a sound, you may want to define genre or maybe give

44
00:02:31,080 --> 00:02:35,035
some recommendations to people who listen to this particular music if it's music sound.

45
00:02:35,035 --> 00:02:38,950
And generally, those tasks are better solved with a droll piece of data,

46
00:02:38,950 --> 00:02:40,650
but some higher level representation,

47
00:02:40,650 --> 00:02:43,435
which is usually learned by your supervised neural networks.

48
00:02:43,435 --> 00:02:44,835
But here's the catch, you see,

49
00:02:44,835 --> 00:02:50,190
you usually get only scarce amount of data within your context which is labeled, but,

50
00:02:50,190 --> 00:02:51,690
so maybe if you're solving

51
00:02:51,690 --> 00:02:54,512
a general classification or maybe if you're trying to recommend,

52
00:02:54,512 --> 00:02:57,345
there's only so much music tracks that

53
00:02:57,345 --> 00:03:00,850
are available for recommendation that your server knows about.

54
00:03:00,850 --> 00:03:03,085
But there is tonnes of unlabeled music out there.

55
00:03:03,085 --> 00:03:05,990
You can literally download hundreds of mp3s,

56
00:03:05,990 --> 00:03:07,993
maybe it's of questionable legality,

57
00:03:07,993 --> 00:03:10,395
but you can download gigabytes of music.

58
00:03:10,395 --> 00:03:13,785
For example, if you want to take a lot of image data,

59
00:03:13,785 --> 00:03:17,255
maybe gigabytes of human faces and transfer it over the web.

60
00:03:17,255 --> 00:03:19,800
What you want to do is instead of transferring the raw pieces you want

61
00:03:19,800 --> 00:03:22,340
to maybe find some compressed representations, say,

62
00:03:22,340 --> 00:03:26,735
a hundred or two hundred float values that describe a face,

63
00:03:26,735 --> 00:03:29,760
compress on the face region of this representation,

64
00:03:29,760 --> 00:03:33,450
transfer it over the web and then decompress the course in the faces.

65
00:03:33,450 --> 00:03:37,928
This is actually possible due to the fact that image data is very redundant,

66
00:03:37,928 --> 00:03:40,740
so even if you as a human were not given the entire image,

67
00:03:40,740 --> 00:03:43,445
if there was some noise or maybe some pixels were missing,

68
00:03:43,445 --> 00:03:47,555
you would easily understand what was there in place of the missing pixels.

69
00:03:47,555 --> 00:03:49,295
You want your newsletter to do the same,

70
00:03:49,295 --> 00:03:51,720
to find this optimal representation that does not have

71
00:03:51,720 --> 00:03:55,510
this redundancy so that you can spend as little information as possible.

72
00:03:55,510 --> 00:03:58,020
Unsupervised learning is also widely used in

73
00:03:58,020 --> 00:04:01,425
the domain of image retrieval or any retrieval basically.

74
00:04:01,425 --> 00:04:03,720
For those of you who have heard this word for the first time,

75
00:04:03,720 --> 00:04:06,330
image retrieval is the thing that search companies do.

76
00:04:06,330 --> 00:04:08,105
So Google does it, Yandex does it.

77
00:04:08,105 --> 00:04:11,385
And retrieval is basically when the user enters you a query and you want to

78
00:04:11,385 --> 00:04:15,105
reply with a lot of documents or images that match this query.

79
00:04:15,105 --> 00:04:17,220
So you mean, you search for, say,

80
00:04:17,220 --> 00:04:19,740
a cat that sat on the mat and you want

81
00:04:19,740 --> 00:04:22,629
to filter out all the images in the database that don't contain

82
00:04:22,629 --> 00:04:28,895
such cat and only give maybe the response of top 10 cats that are sitting on the mat.

83
00:04:28,895 --> 00:04:31,770
Trust me, recent increase in the quality of image search for

84
00:04:31,770 --> 00:04:35,315
example is due to a large part of success of deep learning applied

85
00:04:35,315 --> 00:04:37,920
to it and there is some kind of representation learning we're going to

86
00:04:37,920 --> 00:04:42,555
study later this week that is at play here.

87
00:04:42,555 --> 00:04:47,325
Finally, you could use unsupervised learning when you try to generate new data.

88
00:04:47,325 --> 00:04:50,660
This is kind of a reverse of the problem you have solved last week.

89
00:04:50,660 --> 00:04:54,740
So previously, you wanted to process a cat's image into a label,

90
00:04:54,740 --> 00:04:59,090
now you get a label only when no data at all if you want to narrate a cat an image

91
00:04:59,090 --> 00:05:04,160
that resembles the cat maybe that is indistinguishable from actual images.

92
00:05:04,160 --> 00:05:08,420
This can also be done with some unsupervised learning methods that we are going to cover.

93
00:05:08,420 --> 00:05:11,050
Now finally, there is the exploratory data analysis.

94
00:05:11,050 --> 00:05:13,880
This is the least defined of the least defined methods here

95
00:05:13,880 --> 00:05:16,820
because or the problems if you wish because here

96
00:05:16,820 --> 00:05:20,030
you don't actually want to build an algorithm so much as you

97
00:05:20,030 --> 00:05:23,720
want to look into the data and find whether there is some pattern there.

98
00:05:23,720 --> 00:05:26,450
This is especially useful if the data is low level, for example,

99
00:05:26,450 --> 00:05:30,425
you have a lot of brain scans maybe thousands and you have no time to

100
00:05:30,425 --> 00:05:36,110
actually see all of them to process all the information manually,

101
00:05:36,110 --> 00:05:40,340
instead you want to use data analysis tricks to find whether there is

102
00:05:40,340 --> 00:05:42,134
maybe some particular pattern there that you may

103
00:05:42,134 --> 00:05:45,570
exploit for your problem, say, medical diagnosis.