Question 1

Consider using this encoder-decoder model for machine translation.

This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence $$x$$.

Question 2

In beam search, if you increase the beam width $$B$$, which of the following would you expect to be true? Check all that apply.

Question 3

In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.

Question 4

Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip $$x$$ to a text transcript $$y$$. Your algorithm uses beam search to try to find the value of $$y$$ that maximizes $$P(y \mid x)$$.

On a dev set example, given an input audio clip, your algorithm outputs the transcript $$\hat{y}=$$ “I’m building an A Eye system in Silly con Valley.”, whereas a human gives a much superior transcript $$y^* =$$ “I’m building an AI system in Silicon Valley.”

According to your model,

$$P(\hat{y} \mid x) = 1.09*10^-7$$

$$P(y^* \mid x) = 7.21*10^-8$$

Would you expect increasing the beam width B to help correct this example?

Question 5

Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, $$ P(y^* \mid x) > P(\hat{y} \mid x)$$. This suggest you should focus your attention on improving the search algorithm.

Question 6

Consider the attention model for machine translation.

Further, here is the formula for $$\alpha^{ }$$.

Which of the following statements about $$\alpha^{ }$$ are true? Check all that apply.

Question 7

The network learns where to “pay attention” by learning the values $$e^{ }$$, which are computed using a small neural network:

We can't replace $$s^{ }$$ with $$ s^{ }$$ as an input to this neural network. This is because $$s^{ }$$ depends on $$\alpha^{ }$$ which in turn depends on $$e^{ }$$; so at the time we need to evalute this network, we haven’t computed $$s^{ }$$ yet.

Question 8

Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:

Question 9

Under the CTC model, identical repeated characters not separated by the “blank” character (_) are collapsed. Under the CTC model, what does the following string collapse to?

__c_oo_o_kk___b_ooooo__oo__kkk

Question 10

In trigger word detection, $$x^{ }$$ is: