Google Академик

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - ar**s. We investigate two setups-ICL with flipped labels and ICL with …

Сачувај Цитирај 323 пута наведен Сродни чланци Све верзије (12) HTML верзија

On statistical rates and provably efficient criteria of latent diffusion transformers (dits)

JYC Hu, W Wu, Z Li, S Pi, Z Song… - Advances in Neural …, 2025 - proceedings.neurips.cc

We investigate the statistical and computational limits of latent Diffusion Transformers (DiTs)
under the low-dimensional linear latent space assumption. Statistically, we study the …

Сачувај Цитирај 24 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Outlier-efficient hopfield layers for large transformer-based models

JYC Hu, PH Chang, R Luo, HY Chen, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm {OutEffHop} $)
and use it to address the outlier inefficiency problem of {training} gigantic transformer-based …

Сачувај Цитирај 43 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tensor attention training: Provably efficient learning of higher-order transformers

Y Liang, Z Shi, Z Song, Y Zhou - arxiv preprint arxiv:2405.16411, 2024 - arxiv.org

Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …

Сачувај Цитирај 35 пута наведен Сродни чланци Све верзије (8) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniform memory retrieval with larger capacity for modern hopfield models

D Wu, JYC Hu, TY Hsiao, H Liu - arxiv preprint arxiv:2404.03827, 2024 - arxiv.org

We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed
$\mathtt {U\text {-} Hop} $, with enhanced memory capacity. Our key contribution is a …

Сачувај Цитирај 30 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On computational limits of modern hopfield models: A fine-grained complexity analysis

JYC Hu, T Lin, Z Song, H Liu - arxiv preprint arxiv:2402.04520, 2024 - arxiv.org

We investigate the computational limits of the memory retrieval dynamics of modern Hopfield
models from the fine-grained complexity analysis. Our key contribution is the …

Сачувај Цитирај 37 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-layer transformers gradient can be approximated in almost linear time

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arxiv preprint arxiv:2408.13233, 2024 - arxiv.org

The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …

Сачувај Цитирај 28 пута наведен Сродни чланци Све верзије (5) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hsr-enhanced sparse attention acceleration

B Chen, Y Liang, Z Sha, Z Shi, Z Song - arxiv preprint arxiv:2410.10165, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities across various
applications, but their performance on long-context tasks is often limited by the …

Сачувај Цитирај 21 пута наведен Сродни чланци Све верзије (4) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] pnas.org

Out-of-distribution generalization via composition: a lens through induction heads in transformers

J Song, Z Xu, Y Zhong - Proceedings of the National Academy of Sciences, 2025 - pnas.org

Large language models (LLMs) such as GPT-4 sometimes appear to be creative, solving
novel tasks often with a few demonstrations in the prompt. These tasks require the models to …

Сачувај Цитирај 6 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The closeness of in-context learning and weight shifting for softmax regression

S Li, Z Song, Y **a, T Yu, T Zhou - arxiv preprint arxiv:2304.13276, 2023 - arxiv.org

Large language models (LLMs) are known for their exceptional performance in natural
language processing, making them highly effective in many human life-related or even job …

Сачувај Цитирај 40 пута наведен Сродни чланци Све верзије (4) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Conv-basis: A new paradigm for efficient attention inference and gradient computation in...

Larger language models do in-context learning differently

On statistical rates and provably efficient criteria of latent diffusion transformers (dits)

Outlier-efficient hopfield layers for large transformer-based models

Tensor attention training: Provably efficient learning of higher-order transformers

Uniform memory retrieval with larger capacity for modern hopfield models

On computational limits of modern hopfield models: A fine-grained complexity analysis

Multi-layer transformers gradient can be approximated in almost linear time

Hsr-enhanced sparse attention acceleration

Out-of-distribution generalization via composition: a lens through induction heads in transformers

The closeness of in-context learning and weight shifting for softmax regression