1
00:00:00,000 --> 00:00:04,182
If you're working on a brand new machine learning application,

2
00:00:04,182 --> 00:00:06,840
one of the piece of advice I often give people is that,

3
00:00:06,840 --> 00:00:11,005
I think you should build your first system quickly and then iterate.

4
00:00:11,005 --> 00:00:14,895
Let me show you what I mean. I've worked on speech recognition for many years.

5
00:00:14,895 --> 00:00:18,385
And if you're thinking of building a new speech recognition system,

6
00:00:18,385 --> 00:00:20,430
there's actually a lot of directions you could

7
00:00:20,430 --> 00:00:22,976
go and a lot of things you could prioritize.

8
00:00:22,976 --> 00:00:25,590
For example, there are specific techniques for making

9
00:00:25,590 --> 00:00:29,235
speech recognition systems more robust to noisy background.

10
00:00:29,235 --> 00:00:32,340
And noisy background could mean cafe noise,

11
00:00:32,340 --> 00:00:35,430
like a lot of people talking in the background or car noise,

12
00:00:35,430 --> 00:00:38,930
the sounds of cars and highways or other types of noise.

13
00:00:38,930 --> 00:00:43,440
There are ways to make a speech recognition system more robust to accented speech.

14
00:00:43,440 --> 00:00:48,311
There are specific problems associated with speakers that are far from the microphone,

15
00:00:48,311 --> 00:00:50,705
this is called far-field speech recognition.

16
00:00:50,705 --> 00:00:53,610
Young children speech poses special challenges,

17
00:00:53,610 --> 00:00:56,535
both in terms of how they pronounce individual words as well

18
00:00:56,535 --> 00:00:59,820
as their choice of words and the vocabulary they tend to use.

19
00:00:59,820 --> 00:01:07,130
And if sometimes the speaker stutters or if they use nonsensical phrases like oh, ah,

20
00:01:07,130 --> 00:01:09,960
um, there are different choices and

21
00:01:09,960 --> 00:01:12,940
different techniques for making the transcript that you output,

22
00:01:12,940 --> 00:01:15,310
still read more fluently.

23
00:01:15,310 --> 00:01:17,880
So, there are these and

24
00:01:17,880 --> 00:01:22,710
many other things you could do to improve a speech recognition system.

25
00:01:22,710 --> 00:01:26,693
And more generally, for almost any machine learning application,

26
00:01:26,693 --> 00:01:30,030
there could be 50 different directions you could go in

27
00:01:30,030 --> 00:01:34,650
and each of these directions is reasonable and would make your system better.

28
00:01:34,650 --> 00:01:35,955
But the challenge is,

29
00:01:35,955 --> 00:01:38,990
how do you pick which of these to focus on.

30
00:01:38,990 --> 00:01:42,970
And even though I've worked in speech recognition for many years,

31
00:01:42,970 --> 00:01:46,075
if I'm building a new system for a new application domain,

32
00:01:46,075 --> 00:01:48,730
I would still find it maybe a little bit difficult to

33
00:01:48,730 --> 00:01:52,625
pick without spending some time thinking about the problem.

34
00:01:52,625 --> 00:01:54,550
So what we recommend you do,

35
00:01:54,550 --> 00:01:58,570
if you're starting on building a brand new machine learning application,

36
00:01:58,570 --> 00:02:02,277
is to build your first system quickly and then iterate.

37
00:02:02,277 --> 00:02:04,720
What I mean by that is I recommend that

38
00:02:04,720 --> 00:02:08,470
you first quickly set up a dev/test set and metric.

39
00:02:08,470 --> 00:02:12,360
So this is really deciding where to place your target.

40
00:02:12,360 --> 00:02:14,560
And if you get it wrong, you can always move it later,

41
00:02:14,560 --> 00:02:16,695
but just set up a target somewhere.

42
00:02:16,695 --> 00:02:20,920
And then I recommend you build an initial machine learning system quickly.

43
00:02:20,920 --> 00:02:23,248
Find the training set, train it and see.

44
00:02:23,248 --> 00:02:25,180
Start to see and understand how well you're

45
00:02:25,180 --> 00:02:29,475
doing against your dev/test set and your values and metric.

46
00:02:29,475 --> 00:02:32,633
When you build your initial system,

47
00:02:32,633 --> 00:02:37,180
you then be able to use bias/variance analysis which we talked about

48
00:02:37,180 --> 00:02:42,470
earlier as well as error analysis which we talked about just in the last several videos,

49
00:02:42,470 --> 00:02:45,260
to prioritize the next steps.

50
00:02:45,260 --> 00:02:49,320
In particular, if error analysis

51
00:02:49,320 --> 00:02:52,780
causes you to realize that a lot of the errors are

52
00:02:52,780 --> 00:02:55,675
from the speaker being very far from the microphone,

53
00:02:55,675 --> 00:02:58,342
which causes special challenges to speech recognition,

54
00:02:58,342 --> 00:03:03,990
then that will give you a good reason to focus on techniques to address this called

55
00:03:03,990 --> 00:03:06,640
far-field speech recognition which

56
00:03:06,640 --> 00:03:10,865
basically means handling when the speaker is very far from the microphone.

57
00:03:10,865 --> 00:03:14,693
Of all the value of building this initial system,

58
00:03:14,693 --> 00:03:16,737
it can be a quick and dirty implementation,

59
00:03:16,737 --> 00:03:18,120
you know, don't overthink it,

60
00:03:18,120 --> 00:03:22,690
but all the value of the initial system is having some learned system,

61
00:03:22,690 --> 00:03:26,497
having some trained system allows you to localize bias/variance,

62
00:03:26,497 --> 00:03:28,255
to try to prioritize what to do next,

63
00:03:28,255 --> 00:03:30,270
allows you to do error analysis,

64
00:03:30,270 --> 00:03:31,480
look at some mistakes,

65
00:03:31,480 --> 00:03:34,630
to figure out all the different directions you can go in,

66
00:03:34,630 --> 00:03:37,822
which ones are actually the most worthwhile.

67
00:03:37,822 --> 00:03:44,125
So to recap, what I recommend you do is build your first system quickly, then iterate.

68
00:03:44,125 --> 00:03:47,050
This advice applies less strongly if you're working on

69
00:03:47,050 --> 00:03:52,300
an application area in which you have significant prior experience.

70
00:03:52,300 --> 00:03:56,080
It also implies to build less strongly if there's a significant body of

71
00:03:56,080 --> 00:03:58,480
academic literature that you can draw on

72
00:03:58,480 --> 00:04:01,425
for pretty much the exact same problem you're building.

73
00:04:01,425 --> 00:04:05,810
So, for example, there's a large academic literature on face recognition.

74
00:04:05,810 --> 00:04:08,185
And if you're trying to build a face recognizer,

75
00:04:08,185 --> 00:04:11,725
it might be okay to build a more complex system from the get-go

76
00:04:11,725 --> 00:04:16,595
by building on this large body of academic literature.

77
00:04:16,595 --> 00:04:19,990
But if you are tackling a new problem for the first time,

78
00:04:19,990 --> 00:04:23,235
then I would encourage you to really not

79
00:04:23,235 --> 00:04:27,010
overthink or not make your first system too complicated.

80
00:04:27,010 --> 00:04:30,070
Well, just build something quick and dirty and then use that

81
00:04:30,070 --> 00:04:33,447
to help you prioritize how to improve your system.

82
00:04:33,447 --> 00:04:36,670
So I've seen a lot of machine learning projects and I've

83
00:04:36,670 --> 00:04:40,465
seen some teams overthink the solution and build something too complicated.

84
00:04:40,465 --> 00:04:44,335
I've also seen some teams underthink and then build something maybe too simple.

85
00:04:44,335 --> 00:04:46,240
Well on average, I've seen a lot more

86
00:04:46,240 --> 00:04:49,315
teams overthink and build something too complicated.

87
00:04:49,315 --> 00:04:52,275
And I've seen teams build something too simple.

88
00:04:52,275 --> 00:04:53,920
So I hope this helps,

89
00:04:53,920 --> 00:04:58,583
and if you are applying to your machine learning algorithms to a new application,

90
00:04:58,583 --> 00:05:01,840
and if your main goal is to build something that works,

91
00:05:01,840 --> 00:05:04,720
as opposed to if your main goal is to invent

92
00:05:04,720 --> 00:05:08,020
a new machine learning algorithm which is a different goal,

93
00:05:08,020 --> 00:05:11,075
then your main goal is to get something that works really well.

94
00:05:11,075 --> 00:05:13,360
I'd encourage you to build something quick and dirty.

95
00:05:13,360 --> 00:05:14,600
Use that to do bias/variance analysis,

96
00:05:14,600 --> 00:05:17,890
use that to do error analysis and

97
00:05:17,890 --> 00:05:23,510
use the results of those analysis to help you prioritize where to go next.