1
00:00:03,080 --> 00:00:05,268
Hi, everyone.

2
00:00:05,268 --> 00:00:10,095
Today, we will discuss this new method for visualizing data integrating features.

3
00:00:10,095 --> 00:00:11,540
At the end of this video,

4
00:00:11,540 --> 00:00:14,190
you will be able to use tSNE in your products.

5
00:00:14,190 --> 00:00:15,745
In the previous video,

6
00:00:15,745 --> 00:00:20,930
we learned about metaphysician technique that is predatory very close to linear models.

7
00:00:20,930 --> 00:00:22,980
In this video, we will touch

8
00:00:22,980 --> 00:00:26,355
the subject of non-linear methods of dimensionality reduction.

9
00:00:26,355 --> 00:00:29,180
That says in general are called manifold learning.

10
00:00:29,180 --> 00:00:34,225
For example, look at the data in form of letter S on the left side.

11
00:00:34,225 --> 00:00:36,380
On the right, we can see results of running

12
00:00:36,380 --> 00:00:39,255
different manifold learning algorithm on the data.

13
00:00:39,255 --> 00:00:43,560
This new result is placed at the right bottom corner on the slide.

14
00:00:43,560 --> 00:00:46,803
This new algorithm is the main topic of the lecture,

15
00:00:46,803 --> 00:00:50,170
as it tells of how this really works won't be explained here.

16
00:00:50,170 --> 00:00:54,090
But you will come to look at additional materials for the details.

17
00:00:54,090 --> 00:00:58,295
Let's just say that this is a method that tries to project points from

18
00:00:58,295 --> 00:01:01,340
high dimensional space into small dimensional space

19
00:01:01,340 --> 00:01:05,075
so that the distances between points are approximately preserved.

20
00:01:05,075 --> 00:01:09,500
Let's look at the example of the tSNE on the MNIST dataset.

21
00:01:09,500 --> 00:01:15,225
Here are points from 700 dimensional space that are projected into two dimensional space.

22
00:01:15,225 --> 00:01:19,235
You can see that such projection forms explicit clusters.

23
00:01:19,235 --> 00:01:22,240
Coolest shows that these clusters are meaningful and

24
00:01:22,240 --> 00:01:25,785
corresponds to the target numbers well.

25
00:01:25,785 --> 00:01:29,400
Moreover, neighbor clusters corresponds to a visually similar numbers.

26
00:01:29,400 --> 00:01:32,730
For example, cluster of three is located next to the cluster of

27
00:01:32,730 --> 00:01:37,490
five which in chance is adjustment to the cluster of six and eight.

28
00:01:37,490 --> 00:01:41,535
If data has explicit structure as in case of MNIST dataset,

29
00:01:41,535 --> 00:01:44,460
it's likely to be reflected on tSNE plot.

30
00:01:44,460 --> 00:01:49,410
For the reason tSNE is widely used in exploratory data analysis.

31
00:01:49,410 --> 00:01:53,875
However, do not assume that tSNE is a magic want that always helps.

32
00:01:53,875 --> 00:01:58,640
For example, a misfortune choice of hyperparameters may lead to poor results.

33
00:01:58,640 --> 00:02:02,095
Consider an example, in the center is the least presented

34
00:02:02,095 --> 00:02:06,590
a tSNE projection of exactly the same MNIST data as in previous example,

35
00:02:06,590 --> 00:02:09,340
only perplexity parameter has been changed.

36
00:02:09,340 --> 00:02:11,110
On the left, for comparison,

37
00:02:11,110 --> 00:02:13,225
we have plots from previous right.

38
00:02:13,225 --> 00:02:17,190
On the right, so it present a tSNE projection of random data.

39
00:02:17,190 --> 00:02:20,790
We can see as a choice of hybrid parameters change projection of

40
00:02:20,790 --> 00:02:24,500
MNIST data significantly so that we cannot see clusters.

41
00:02:24,500 --> 00:02:30,775
Moreover, new projection become more similar to random data rather than to the original.

42
00:02:30,775 --> 00:02:34,615
Let's find out what depends on the perplexity hyperparameter value.

43
00:02:34,615 --> 00:02:36,426
On the left, we have perplexity=3,

44
00:02:36,426 --> 00:02:42,805
in the center=10, and on the right= 150.

45
00:02:42,805 --> 00:02:47,910
I want to emphasize that these projections are all made for the same data.

46
00:02:47,910 --> 00:02:52,875
The illustration shows that these new results strongly depends on its parameters,

47
00:02:52,875 --> 00:02:57,270
and the interpretation of the results is not a simple task.

48
00:02:57,270 --> 00:02:59,500
In particular, one cannot infer the size of

49
00:02:59,500 --> 00:03:02,855
original clusters using the size of projected clusters.

50
00:03:02,855 --> 00:03:06,050
Similar proposition is valid for a distance between clusters.

51
00:03:06,050 --> 00:03:09,417
Blog distill.pub contain a post

52
00:03:09,417 --> 00:03:13,595
about how to understand and interpret the results of tSNE.

53
00:03:13,595 --> 00:03:16,220
Also, it contains a great interactive demo

54
00:03:16,220 --> 00:03:19,575
that will help you to get into issues of how tSNE works.

55
00:03:19,575 --> 00:03:21,980
I strongly advise you to take a look at it.

56
00:03:21,980 --> 00:03:24,690
In addition to exploratory data analysis,

57
00:03:24,690 --> 00:03:28,770
tSNE can be considered as a method to obtain new features from data.

58
00:03:28,770 --> 00:03:33,235
You should just concatenate the transformers coordinates to the original feature matrix.

59
00:03:33,235 --> 00:03:35,680
Now if you've heard this about practical details,

60
00:03:35,680 --> 00:03:37,270
as it has been shown earlier,

61
00:03:37,270 --> 00:03:38,490
the results of tSNE algorithm,

62
00:03:38,490 --> 00:03:41,480
it strongly depends on hyperparameters.

63
00:03:41,480 --> 00:03:45,690
It is good practice to use several projections with different perplexities.

64
00:03:45,690 --> 00:03:49,110
In addition, because of stochastic of this methods results in

65
00:03:49,110 --> 00:03:52,660
different projections even with the same data and hyperparameters.

66
00:03:52,660 --> 00:03:58,490
This means the train and test sets should be projected together rather than separately.

67
00:03:58,490 --> 00:04:02,575
Also, tSNE will run for a long time if you have a lot of features.

68
00:04:02,575 --> 00:04:05,290
If the number of features is greater than 500,

69
00:04:05,290 --> 00:04:09,165
you should use one of dimensionality reduction approach and reduce number of features,

70
00:04:09,165 --> 00:04:11,585
for example, to 100.

71
00:04:11,585 --> 00:04:15,700
Implementation of tSNE can be found in the sklearn library.

72
00:04:15,700 --> 00:04:17,255
But personally, I prefer to use

73
00:04:17,255 --> 00:04:20,975
another implementation from a separate Python package called tSNE,

74
00:04:20,975 --> 00:04:24,830
since it provide a way more efficient implementation.

75
00:04:24,830 --> 00:04:28,570
In conclusion, I want to remind you the basic points of the lecture.

76
00:04:28,570 --> 00:04:31,630
TSNE is an excellent tool for visualizing data.

77
00:04:31,630 --> 00:04:33,785
If data has an explicit structure,

78
00:04:33,785 --> 00:04:37,318
then it likely be [inaudible] on tSNE projection.

79
00:04:37,318 --> 00:04:41,615
However, it requires to be cautious with interpretation of tSNE results.

80
00:04:41,615 --> 00:04:46,145
Sometimes you can see structure where it does not exist or vice versa,

81
00:04:46,145 --> 00:04:48,785
see none where structure is actually present.

82
00:04:48,785 --> 00:04:53,530
It's a good practice to do several tSNE projections with different perplexities.

83
00:04:53,530 --> 00:04:55,035
And in addition to EJ,

84
00:04:55,035 --> 00:04:59,125
tSNE is working very well as a feature for feeding models.

85
00:04:59,125 --> 00:05:01,800
Thank you for your attention.