1 00:00:00,060 --> 00:00:05,130 you've now learned so much about deep 2 00:00:02,610 --> 00:00:07,529 learning and sequence models that we can 3 00:00:05,130 --> 00:00:10,200 actually describe a trigger word system 4 00:00:07,529 --> 00:00:12,509 quite simply just on one slide as you 5 00:00:10,200 --> 00:00:14,219 see in this video but when the rise of 6 00:00:12,509 --> 00:00:16,170 speech recognition have been more and 7 00:00:14,219 --> 00:00:17,789 more devices you can wake up with your 8 00:00:16,170 --> 00:00:20,580 voice and those are sometimes called 9 00:00:17,789 --> 00:00:22,289 trigger word detection systems so let's 10 00:00:20,580 --> 00:00:25,410 see how you can build a trigger word 11 00:00:22,289 --> 00:00:28,320 system examples of triggering systems 12 00:00:25,410 --> 00:00:29,730 include Amazon echo which is broken out 13 00:00:28,320 --> 00:00:32,579 what that word elixir 14 00:00:29,730 --> 00:00:36,420 the by dou dou R or s part devices woken 15 00:00:32,579 --> 00:00:38,700 up with face duty how Apple Siri working 16 00:00:36,420 --> 00:00:41,730 out with hey Siri and Google home woken 17 00:00:38,700 --> 00:00:44,550 up with ok Google so stands the trigger 18 00:00:41,730 --> 00:00:46,620 word detection that if you have say an 19 00:00:44,550 --> 00:00:49,079 Amazon echo in your living room you can 20 00:00:46,620 --> 00:00:51,180 walk the living room and just say Alexa 21 00:00:49,079 --> 00:00:53,010 what time is it and have it wake up I'll 22 00:00:51,180 --> 00:00:57,120 be triggered by the words of exer and 23 00:00:53,010 --> 00:00:58,949 answer your voice query so if you can 24 00:00:57,120 --> 00:01:01,379 build a trigger word detection system 25 00:00:58,949 --> 00:01:03,690 maybe you can make your computer do 26 00:01:01,379 --> 00:01:06,840 something by telling it computer 27 00:01:03,690 --> 00:01:09,689 activate one of my friends also works on 28 00:01:06,840 --> 00:01:12,330 turning on an offer particular lamp 29 00:01:09,689 --> 00:01:14,700 using a trigger word kind of as a fun 30 00:01:12,330 --> 00:01:16,439 project but what I want to show you is 31 00:01:14,700 --> 00:01:19,439 how you can build a trigger word 32 00:01:16,439 --> 00:01:21,750 detection system now the trigger word 33 00:01:19,439 --> 00:01:24,409 detection literature is still evolving 34 00:01:21,750 --> 00:01:27,000 so there actually isn't a single 35 00:01:24,409 --> 00:01:30,270 universally agreed on algorithm for 36 00:01:27,000 --> 00:01:31,920 trigger word detection yet the 37 00:01:30,270 --> 00:01:33,840 literature on trigger word detection 38 00:01:31,920 --> 00:01:36,240 algorithm is still evolving so there 39 00:01:33,840 --> 00:01:37,500 isn't wide consensus yet on what's the 40 00:01:36,240 --> 00:01:39,450 best algorithm for trigger word 41 00:01:37,500 --> 00:01:41,899 detection so I'm just going to show you 42 00:01:39,450 --> 00:01:45,180 one example of an algorithm you can use 43 00:01:41,899 --> 00:01:47,299 now you've seen our ends like this and 44 00:01:45,180 --> 00:01:50,159 what we really do is take an audio clip 45 00:01:47,299 --> 00:01:51,090 maybe compute spectrogram features and 46 00:01:50,159 --> 00:01:54,119 that generates 47 00:01:51,090 --> 00:01:57,540 features x1 x2 x3 audio features x1 x2 48 00:01:54,119 --> 00:02:00,329 x3 that you pass through an RNN and so 49 00:01:57,540 --> 00:02:06,180 all that remains to be done is to define 50 00:02:00,329 --> 00:02:08,970 the target labels Y so if this point in 51 00:02:06,180 --> 00:02:11,310 the audio clip is when someone just 52 00:02:08,970 --> 00:02:13,080 finished saying the trigger word such as 53 00:02:11,310 --> 00:02:15,870 a lecture or saline 54 00:02:13,080 --> 00:02:17,760 hey Suri or okay Google then in the 55 00:02:15,870 --> 00:02:20,640 training sets you can set the target 56 00:02:17,760 --> 00:02:23,250 labels to be zero for everything before 57 00:02:20,640 --> 00:02:27,270 that point and right after that to set 58 00:02:23,250 --> 00:02:29,460 the target label of one and then if a 59 00:02:27,270 --> 00:02:32,280 little bit later on you know the trigger 60 00:02:29,460 --> 00:02:35,670 word was set again and the trigger was 61 00:02:32,280 --> 00:02:38,460 said at this point then you can again 62 00:02:35,670 --> 00:02:43,170 set the target label to be one right 63 00:02:38,460 --> 00:02:46,230 after that now this type of labeling 64 00:02:43,170 --> 00:02:47,700 scheme for an RNN you know could work 65 00:02:46,230 --> 00:02:51,150 actually this won't actually work 66 00:02:47,700 --> 00:02:53,610 reasonably well one slight disadvantage 67 00:02:51,150 --> 00:02:56,010 of this is it creates a very imbalanced 68 00:02:53,610 --> 00:02:56,760 training set so if a lot more zeros than 69 00:02:56,010 --> 00:02:59,610 ones 70 00:02:56,760 --> 00:03:01,610 so one other thing you could do that 71 00:02:59,610 --> 00:03:03,810 it's getting a little bit of a hack but 72 00:03:01,610 --> 00:03:06,600 could make them all the little bit easy 73 00:03:03,810 --> 00:03:08,970 to train is instead of setting only a 74 00:03:06,600 --> 00:03:11,210 single time step to output one you can 75 00:03:08,970 --> 00:03:13,530 actually make an output a few ones for 76 00:03:11,210 --> 00:03:16,950 several times or for a fixed period of 77 00:03:13,530 --> 00:03:23,540 time before reverting back to zero so 78 00:03:16,950 --> 00:03:27,600 and that thumb slightly evens out the 79 00:03:23,540 --> 00:03:32,760 ratio of ones to zeros but this is a 80 00:03:27,600 --> 00:03:34,980 little bit of a hack but if this is when 81 00:03:32,760 --> 00:03:37,080 in the audio clipper trigger where the 82 00:03:34,980 --> 00:03:38,940 set then right after that you can set 83 00:03:37,080 --> 00:03:40,800 the toggle able to one and if this is 84 00:03:38,940 --> 00:03:42,930 the trigger words say the game then 85 00:03:40,800 --> 00:03:49,050 right after that just when you want the 86 00:03:42,930 --> 00:03:51,180 RNN to output one so you get to play 87 00:03:49,050 --> 00:03:54,750 more of this as well in the programming 88 00:03:51,180 --> 00:03:56,370 exercise but so I think you should feel 89 00:03:54,750 --> 00:03:58,650 quite proud of yourself we've learned 90 00:03:56,370 --> 00:04:00,540 enough about the learning that it just 91 00:03:58,650 --> 00:04:03,090 takes one picture at one slide to this 92 00:04:00,540 --> 00:04:06,360 to describe something as complicated as 93 00:04:03,090 --> 00:04:07,920 trigger word detection and based on this 94 00:04:06,360 --> 00:04:10,380 I hope you'd be able to implement 95 00:04:07,920 --> 00:04:12,450 something that works and allows you to 96 00:04:10,380 --> 00:04:15,120 detect trigger words 97 00:04:12,450 --> 00:04:18,450 but you see more of this in the program 98 00:04:15,120 --> 00:04:20,519 exercise so that's it for trigger words 99 00:04:18,450 --> 00:04:21,630 and I hope you feel quite proud of 100 00:04:20,519 --> 00:04:23,760 yourself for how much you've learned 101 00:04:21,630 --> 00:04:26,340 about deep learning that you can now 102 00:04:23,760 --> 00:04:28,350 describe trigger words in just one slide 103 00:04:26,340 --> 00:04:30,360 in a few minutes and that you've been 104 00:04:28,350 --> 00:04:32,700 hopeful II implemented and get it to 105 00:04:30,360 --> 00:04:34,350 work maybe even make it do something fun 106 00:04:32,700 --> 00:04:35,580 in your house that I'm like turn on or 107 00:04:34,350 --> 00:04:37,889 turn off um you could do something like 108 00:04:35,580 --> 00:04:40,740 a computer when you're when someone else 109 00:04:37,889 --> 00:04:43,230 says they trigger words on this is the 110 00:04:40,740 --> 00:04:46,440 last technical video of this course and 111 00:04:43,230 --> 00:04:48,960 to wrap up in this course on sequence 112 00:04:46,440 --> 00:04:52,950 models you learned about rnns including 113 00:04:48,960 --> 00:04:55,169 both gr use and LS TMS and then in the 114 00:04:52,950 --> 00:04:56,450 second week you learned a lot about word 115 00:04:55,169 --> 00:04:59,160 embeddings and how they learn 116 00:04:56,450 --> 00:05:01,440 representations of words and then in 117 00:04:59,160 --> 00:05:04,530 this week you learned about the 118 00:05:01,440 --> 00:05:07,770 attention model as well as how to use it 119 00:05:04,530 --> 00:05:10,169 to process audio data and I hope you 120 00:05:07,770 --> 00:05:12,840 have fun implementing all of these ideas 121 00:05:10,169 --> 00:05:15,470 in this beast program sighs let's go on 122 00:05:12,840 --> 00:05:15,470 to the last video