Efficiently scaling transformer inference
We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …
most challenging settings: large deep models, with tight latency targets and long sequence …
Machine translation systems based on classical-statistical-deep-learning approaches
Over recent years, machine translation has achieved astounding accomplishments. Machine
translation has become more evident with the need to understand the information available …
translation has become more evident with the need to understand the information available …
Sparse is enough in scaling transformers
Large Transformer models yield impressive results on many tasks, but are expensive to
train, or even fine-tune, and so slow at decoding that their use and study becomes out of …
train, or even fine-tune, and so slow at decoding that their use and study becomes out of …
Exploring lottery ticket hypothesis in spiking neural networks
Abstract Spiking Neural Networks (SNNs) have recently emerged as a new generation of
low-power deep neural networks, which is suitable to be implemented on low-power …
low-power deep neural networks, which is suitable to be implemented on low-power …
Losing Heads in the Lottery: Pruning Transformer
The attention mechanism is the crucial component of the transformer architecture. Recent
research shows that most attention heads are not confident in their decisions and can be …
research shows that most attention heads are not confident in their decisions and can be …
Gradient flow in sparse neural networks and how lottery tickets win
Abstract Sparse Neural Networks (NNs) can match the generalization of dense NNs using a
fraction of the compute/storage for inference, and have the potential to enable efficient …
fraction of the compute/storage for inference, and have the potential to enable efficient …
Super tickets in pre-trained language models: From model compression to improving generalization
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists
of``lottery tickets'', and training a certain collection of them (ie, a subnetwork) can match the …
of``lottery tickets'', and training a certain collection of them (ie, a subnetwork) can match the …
Differentiable subset pruning of transformer heads
Multi-head attention, a collection of several attention mechanisms that independently attend
to different parts of the input, is the key ingredient in the Transformer. Recent work has …
to different parts of the input, is the key ingredient in the Transformer. Recent work has …
The lottery ticket hypothesis for object recognition
Recognition tasks, such as object recognition and keypoint estimation, have seen
widespread adoption in recent years. Most state-of-the-art methods for these tasks use deep …
widespread adoption in recent years. Most state-of-the-art methods for these tasks use deep …
Small pre-trained language models can be fine-tuned as large models via over-parameterization
By scaling the model size, large pre-trained language models (PLMs) have shown
remarkable performance in various natural language processing tasks, mostly outperforming …
remarkable performance in various natural language processing tasks, mostly outperforming …