Generating Long and Diverse Responses with Neural Conversation Models

Louis Shao , Stephan Gouws , Denny Britz , Anna Goldie , Brian Strope , Ray Kurzweil. Generating Long and Diverse Responses with Neural Conversation Models. URL https://openreview.net/pdf?id=HJDdiT9gl

TLscreen-shot-2017-02-28-at-6-11-44-amDR; Introduces the target-glimpse model with a stochastic beam-search algorithm to generate long, coherent and diverse responses for conversations.

Note: Not a math heavy paper but requires understanding of seq-to-seq models for NMT and basics of attentional interfaces and beam-search.

Detailed Notes:

  • The main issue with current conversational models based on traditional sequence-to-sequence architectures is that they produce short, generic responses. This paper explores two modifications. The first is the glimpse model, which trains on fixed-sized segments from the target. The second is a segment oriented stochastic decoder, which results in diverse responses from the start in the decoding stage.
eq1

Traditional Seq-to-Seq Architecture

  • The sequence-to-sequence approach for conversations needs to be different from translation because the task at hand is fundamentally different. With translation, the source provides us all the semantic information we need to decode the target sequence. But with conversation, the target response can be very different from the semantic information stored in the source. The example the paper provided was a asking a short, simple question that may solicit a long, detailed response (Ex. What did you do today?).

Target Glimpse Model

diagram1

An example of the glimpse model with fixed target size of 3. Source: Generating Long and Diverse Responses with Neural Conversation Models.

  • Therefore, using the same seq-to-seq architecture for conversation as translation leads to a few issues. The first is failing to generate long responses and even if we explicitly call for longer responses (such as length-normalization), the results are incoherent. This is largely due to the decoder having to store all of it’s previous outputs as one hidden state vector. The paper proposes target-side attention into the decoder which allows for consideration of the target sequence thus far. They further extend this idea into the “glimpse model”, which trains the decoder on fixed-length segments from the target. This allows us to apply the target side attention for really long responses (without worrying about memory issues).
eq2

Overall steps for the target-glimpse operations.

  • We want to use this target-side attention to generate long and coherent responses. All of the current methods to provide coherent responses, such as length-normalization coupled with reranking the N-best maximum a posteriori (MAP) responses using some diversity inclined features, fail when applied to long sequence generations. This paper explores breaking up the reranking over short segments and to continuously rerank segment by segment. This allows us to create diverse responses earlier in the decoder processing.

Stochastic Decoding with Segment-by-Segment Reranking

  • Just a quick review on beam search (nice, short tutorial): We keep a set of B highest scoring sequences at each decoder step. At each step, we will account for all possible single tokens we can append to our predicted target sequence so far and store the top B (highest product of normalized log probabilities). Since we want to increase the diversity, the paper employs a stochastic sampling operation as opposed to the traditional deterministic top B selection method.
eq3

Overall steps for the stochastic decoding.

  • The set \Phi is a set of the Q randomly sampled source sequences. This is basically a Monte-Carlo approximation where the set is unbiased from P(X).

Unique Points:

  1. Model details are elaborated in the paper but a few set values for the parameters mentioned above are Q=15, H=10, B=2, D=10, K=10.
  2. Quite a few large experiments done with this new architecture. Information on the evaluation criteria and performance can be found in section 5.1 and 5.2, respectively. However, I do want to highlight the datasets used/created, which the authors aim as a new standard for testing conversational models, as previous datasets were limited in scope and size.
  • Reddit data (1.7 billion messages, 221 million conversations).
  • 2009 Open Subtitles data (0.5 million conversations).
  • The Stack Exchange data (0.8 million conversations).
  • 17 million conversation from the web.

Takeaway

  • The results were quite diverse and coherent, compared to the previous SOTA quality. However, there is still quite a bit of work, as we still see some redundancy/incoherence in some long responses. However, the paper proves how these two architectural changes can have a drastic impact on the conversation quality. Additionally, these large datasets from the paper give us a chance to train on different data scopes for future models.
  • This is a big step towards improving conversations in terms of diversity and long. coherent responses. I will be implementing this model using the new datasets and perform some ablation studies with the goal of improving on the architecture.

(*) Follow me on Twitter for all updates and some larger projects I won’t be posting on this tutorials blog.

2 thoughts on “Generating Long and Diverse Responses with Neural Conversation Models

  1. Shin Jie says:

    The original paper was saying the same thing in different ways repeatedly and it got quite confusing. Thanks for laying out the steps and I’d love to compare performance on the reddit sets once I am finished with my implementation as well (Theano).

    Liked by 1 person

    • gokumohandas says:

      Sure, PM’d you some of my initial results (almost no parameter optimization done yet). And I actually found papers that repeat the same underlying message in different modes helps me understand it better and often I don’t really see the architecture until it’s described in a way I’m used to. But glad you found this helpful!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s