- Academic Search

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Enregistrer Citer Cité 68 fois Autres articles Les 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Towards unified deep image deraining: A survey and a new benchmark

X Chen, J Pan, J Dong, J Tang - arxiv preprint arxiv:2310.03535, 2023 - arxiv.org

Recent years have witnessed significant advances in image deraining due to the kinds of
effective image priors and deep learning models. As each deraining approach has …

Enregistrer Citer Cité 19 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Llmlingua: Compressing prompts for accelerated inference of large language models

H Jiang, Q Wu, CY Lin, Y Yang, L Qiu - arxiv preprint arxiv:2310.05736, 2023 - arxiv.org

Large language models (LLMs) have been applied in various applications due to their
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) …

Enregistrer Citer Cité 181 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aaai.org

Evo-vit: Slow-fast token evolution for dynamic vision transformer

Y Xu, Z Zhang, M Zhang, K Sheng, K Li… - Proceedings of the …, 2022 - ojs.aaai.org

Vision transformers (ViTs) have recently received explosive popularity, but the huge
computational cost is still a severe issue. Since the computation complexity of ViT is …

Enregistrer Citer Cité 188 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Less is more: Focus attention for efficient detr

D Zheng, W Dong, H Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

DETR-like models have significantly boosted the performance of detectors and even
outperformed classical convolutional models. However, all tokens are treated equally …

Enregistrer Citer Cité 68 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

E Kurtic, D Campos, T Nguyen, E Frantar… - arxiv preprint arxiv …, 2022 - arxiv.org

Transformer-based language models have become a key building block for natural
language processing. While these models are extremely accurate, they can be too large and …

Enregistrer Citer Cité 130 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Enregistrer Citer Cité 95 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer

X Chen, Z Liu, H Tang, L Yi… - Proceedings of the …, 2023 - openaccess.thecvf.com

High-resolution images enable neural networks to learn richer visual representations.
However, this improved performance comes at the cost of growing computational …

Enregistrer Citer Cité 49 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Model tells you what to discard: Adaptive kv cache compression for llms

S Ge, Y Zhang, L Liu, M Zhang, J Han, J Gao - arxiv preprint arxiv …, 2023 - arxiv.org

In this study, we introduce adaptive KV cache compression, a plug-and-play method that
reduces the memory footprint of generative inference for Large Language Models (LLMs) …

Enregistrer Citer Cité 134 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Dynamic context pruning for efficient and interpretable autoregressive transformers

S Anagnostidis, D Pavllo, L Biggio… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Autoregressive Transformers adopted in Large Language Models (LLMs) are hard
to scale to long sequences. Despite several works trying to reduce their computational cost …

Enregistrer Citer Cité 49 fois Autres articles Les 6 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Learned token pruning for transformers

A survey of techniques for optimizing transformer inference

Towards unified deep image deraining: A survey and a new benchmark

Llmlingua: Compressing prompts for accelerated inference of large language models

Evo-vit: Slow-fast token evolution for dynamic vision transformer

Less is more: Focus attention for efficient detr

The optimal bert surgeon: Scalable and accurate second-order pruning for large language models

Full stack optimization of transformer inference: a survey

Sparsevit: Revisiting activation sparsity for efficient high-resolution vision transformer

Model tells you what to discard: Adaptive kv cache compression for llms

Dynamic context pruning for efficient and interpretable autoregressive transformers