Exploring Sparsity in Recurrent Neural Networks

Exploring Sparsity in Recurrent Neural Network Sharan Narang, Greg Diamos, Shubho Sengupta & Erich Elsen https://arxiv.org/abs/1704.05119 TLDR; Reduce the number of parameters in your large recurrent models while maintaining high performance. Summary The issue with large models is that they are hard to deploy on embedded devices/ mobile phones. Even if your inference speed is acceptable, these … Continue reading Exploring Sparsity in Recurrent Neural Networks

Overcoming Catastrophic Forgetting in Neural Networks

Overcoming Catastrophic Forgetting in Neural Networks James Kirkpatrick, Raia Hadsell, et al. https://arxiv.org/abs/1612.00796 TLDR; Catastrophic forgetting is forgetting key information needed to solve a previous task when training on a new task. However, there are several approaches to combat this issue which allows for continual learning. In this post we will look at DeepMind's elastic … Continue reading Overcoming Catastrophic Forgetting in Neural Networks

Opening the Black Box of Deep Neural Networks via Information

Opening the Black Box of Deep Neural Networks via Information Ravid Shwartz-Ziv and Naftali Tishby https://arxiv.org/abs/1703.00810 TLDR; Stochastic gradient descent (SGD) has two unique stages; empirical error minimization (ERM) and representation compression. This paper explores these stages in an attempt to justify the success of DNNs. Introduction: It's easy to see how DNNs form a … Continue reading Opening the Black Box of Deep Neural Networks via Information

Making Neural Programming Architecture Generalize Via Recursion

Making Neural Programming Architecture Generalize Via Recursion Jonathon Cai, Richard Shin, Dawn Song https://openreview.net/pdf?id=BkbY4psgg TLDR; Neural networks that try to learn programs have very poor generalizability and interpretability. The authors use recursion to tackle both issues for a variety of sorting algorithms and an arithmetic operation by breaking down the task into smaller modules and … Continue reading Making Neural Programming Architecture Generalize Via Recursion

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher https://arxiv.org/abs/1611.01587 TLDR; The joint many-task model tackles multiple NLP tasks with a single architecture. Tasks are layered such that subsequent and previous tasks benefit from training of the closely-related tasks. Though applied to specific NLP … Continue reading A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Domain Adaptation in Question Answering

TLDR; NLU tasks (QA, RTE, etc.) can benefit from transfer learning but the degree depends on the source task. https://arxiv.org/abs/1702.02171 ( At the time of this post, authors have renamed and removed the paper but previous versions are still active here. ) Note: Originally named Question Answering through Transfer Learning from Large Fine-grained Supervision Data Introduction: NLU … Continue reading Domain Adaptation in Question Answering