Google Academic

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org

We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

Salvați Citați Citat de 369 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mdpi.com

Machine translation systems based on classical-statistical-deep-learning approaches

S Sharma, M Diwakar, P Singh, V Singh, S Kadry… - Electronics, 2023 - mdpi.com

Over recent years, machine translation has achieved astounding accomplishments. Machine
translation has become more evident with the need to understand the information available …

Salvați Citați Citat de 26 ori Articole cu conținut similar Toate cele 4 versiuni În cache

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Sparse is enough in scaling transformers

S Jaszczur, A Chowdhery… - Advances in …, 2021 - proceedings.neurips.cc

Large Transformer models yield impressive results on many tasks, but are expensive to
train, or even fine-tune, and so slow at decoding that their use and study becomes out of …

Salvați Citați Citat de 92 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Exploring lottery ticket hypothesis in spiking neural networks

Y Kim, Y Li, H Park, Y Venkatesha, R Yin… - European Conference on …, 2022 - Springer

Abstract Spiking Neural Networks (SNNs) have recently emerged as a new generation of
low-power deep neural networks, which is suitable to be implemented on low-power …

Salvați Citați Citat de 61 ori Articole cu conținut similar Toate cele 7 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] ed.ac.uk

Losing Heads in the Lottery: Pruning Transformer

M Behnke, K Heafield - The 2020 Conference on Empirical …, 2020 - research.ed.ac.uk

The attention mechanism is the crucial component of the transformer architecture. Recent
research shows that most attention heads are not confident in their decisions and can be …

Salvați Citați Citat de 84 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] aaai.org

Gradient flow in sparse neural networks and how lottery tickets win

U Evci, Y Ioannou, C Keskin, Y Dauphin - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Abstract Sparse Neural Networks (NNs) can match the generalization of dense NNs using a
fraction of the compute/storage for inference, and have the potential to enable efficient …

Salvați Citați Citat de 80 ori Articole cu conținut similar Toate cele 11 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Super tickets in pre-trained language models: From model compression to improving generalization

C Liang, S Zuo, M Chen, H Jiang, X Liu, P He… - arxiv preprint arxiv …, 2021 - arxiv.org

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists
of``lottery tickets'', and training a certain collection of them (ie, a subnetwork) can match the …

Salvați Citați Citat de 64 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mit.edu

Differentiable subset pruning of transformer heads

J Li, R Cotterell, M Sachan - Transactions of the Association for …, 2021 - direct.mit.edu

Multi-head attention, a collection of several attention mechanisms that independently attend
to different parts of the input, is the key ingredient in the Transformer. Recent work has …

Salvați Citați Citat de 51 ori Articole cu conținut similar Toate cele 9 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

The lottery ticket hypothesis for object recognition

S Girish, SR Maiya, K Gupta, H Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recognition tasks, such as object recognition and keypoint estimation, have seen
widespread adoption in recent years. Most state-of-the-art methods for these tasks use deep …

Salvați Citați Citat de 72 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] aclanthology.org

Small pre-trained language models can be fine-tuned as large models via over-parameterization

ZF Gao, K Zhou, P Liu, WX Zhao… - Proceedings of the 61st …, 2023 - aclanthology.org

By scaling the model size, large pre-trained language models (PLMs) have shown
remarkable performance in various natural language processing tasks, mostly outperforming …

Salvați Citați Citat de 13 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Successfully applying the stabilized lottery ticket hypothesis to the transformer architecture

Efficiently scaling transformer inference