Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

https://arxiv.org/abs/1406.1078

TLDR; Similar to Sutskever’s paper on neural translation. Encoding and decoding are separate RNNs. The encoder maps variable length input to fixed length vector, decoder takes fixed length vector and maps to variable length output.

Detailed Notes:

  • The translation architecture involves use separate RNNs. One for encoding the input sequence and the other for decoding the output from the encoder into our target sequence.

architecture

  • The encoder maps variable length input to fixed length vector, decoder takes fixed length vector and maps to variable length output.
  • In the decoder, we will use softmax weights to predict the next word of the target sequence. At the end we will take the multiplicative series of all the probabilities to get the probability of the entire target sequence.

softmax multiply

  • The objective is to maximize the log probability of the target sequence given the input sequence.

objective

Training Points:

  • 100 hidden units per state
  • 100 dimensional embeddings
  • adadelta and SGD (momentum = 0.95, epsilon = 10^-6)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s