- Academic Search

J Gei**, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

Gem Citer Citeret af 79 Relaterede artikler Alle 8 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving transformers with dynamically composable multi-head attention

D **ao, Q Meng, S Li, X Yuan - arxiv preprint arxiv:2405.08553, 2024 - arxiv.org

Multi-Head Attention (MHA) is a key component of Transformer. In MHA, attention heads
work independently, causing problems such as low-rank bottleneck of attention score …

Gem Citer Citeret af 4 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] uts.edu.au

T3SRS: Tensor Train Transformer for compressing sequential recommender systems

H Li, J Zhao, H Huo, S Fang, J Chen, L Yao… - Expert Systems with …, 2024 - Elsevier

In recent years, attention mechanisms have gained popularity in sequential recommender
systems (SRSs) due to obtaining dynamic user preferences efficiently. However, over …

Gem Citer Citeret af 3 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

M Xu, S Sharmin, DP Mandic - arxiv preprint arxiv:2410.03040, 2024 - arxiv.org

Matrix and tensor-guided parametrization for Natural Language Processing (NLP) models is
fundamentally useful for the improvement of the model's systematic efficiency. However, the …

Gem Citer Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Vision Transformer with Irregular Attention

D Ermilov, N Kozyrskiy, I Vorona, ANHHUY PHAN… - openreview.net

Compression of Transformer is a natural request that arose in the computer vision
community. Apart from quantization that hardly rely on hardware, sparsification is another …

Gem Citer Relaterede artikler Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Tuformer: Data-driven design of transformers for improved generalization or efficiency

Cramming: Training a Language Model on a single GPU in one day.

Improving transformers with dynamically composable multi-head attention

T3SRS: Tensor Train Transformer for compressing sequential recommender systems

Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models

Vision Transformer with Irregular Attention