Important Papers

A Simple NN Module for Relational Reasoning: A simple neural network module for relational reasoning
A Tutorial Introduction to the Minimum Description Length Principle: A tutorial introduction to the minimum description length principle
Attention is all you need: Attention Is All You Need
Deep Residual Learning for Image Recognition: Deep Residual Learning for Image Recognition
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin: Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Identity Mappings in Deep Residual Networks: Identity Mappings in Deep Residual Networks
ImageNet classification with deep convolutional neural networks: ImageNet classification with deep convolutional neural networks
Keeping Neural Networks Simple by Minimizing the Description Length of the Weights:
Machine Super Intelligence:
Multi-Scale Context Aggregation by Dilated Convolutions: Multi-Scale Context Aggregation by Dilated Convolutions
Neural Machine Translation by Jointly Learning to Align and Translate: Neural Machine Translation by Jointly Learning to Align and Translate
Neural Message Passing for Quantum Chemistry: Neural Message Passing for Quantum Chemistry
Neural Turing Machines: Neural Turing Machines
Order Matters: Sequence to sequence for sets: Order Matters: Sequence to sequence for sets
Pointer Networks: Pointer Networks
Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton: Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton
Recurrent Neural Network Regularization: Recurrent Neural Network Regularization
Relational recurrent neural networks: Relational recurrent neural networks
Scaling Laws for Neural Language Models: Scaling Laws for Neural Language Models
The Annotated Transformer:
The First Law of Complexodynamics:
The Unreasonable Effectiveness of Recurrent Neural Networks:
Understanding LSTM Networks:
Variational Lossy Autoencoder: Variational Lossy Autoencoder