- Academic Search

M Wang - arxiv preprint arxiv:2402.00522, 2024 - arxiv.org

We conduct a systematic study of the approximation properties of Transformer for sequence
modeling with long, sparse and complicated memory. We investigate the mechanisms …

Spara Citera Citerat av 6 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] sjtu.edu.cn

[PDF][PDF] Initialization is critical to whether transformers fit composite functions by inference or memorizing

Z Zhang, P Lin, Z Wang, Y Zhang… - arxiv preprint arxiv …, 2024 - ins.sjtu.edu.cn

Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

Spara Citera Citerat av 2 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The Buffer Mechanism for Multi-Step Information Reasoning in Language Models

Z Wang, Y Wang, Z Zhang, Z Zhou, H **, T Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models have consistently struggled with complex reasoning tasks, such as
mathematical problem-solving. Investigating the internal reasoning mechanisms of these …

Spara Citera Relaterade artiklar Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Anchor Attention, Small Cache: Code Generation with Large Language Models

X Zhang, Y Zhou, G Yang, HC Gall, T Chen - arxiv preprint arxiv …, 2024 - arxiv.org

The development of large language models (LLMs) has revolutionized automated code
generation. However, their high demand of computation resources has hindered a broader …

Spara Citera Relaterade artiklar Alla 2 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Anchor function: a type of benchmark functions for studying language models

Understanding the expressive power and mechanisms of transformer for sequence modeling

[PDF][PDF] Initialization is critical to whether transformers fit composite functions by inference or memorizing

The Buffer Mechanism for Multi-Step Information Reasoning in Language Models

Anchor Attention, Small Cache: Code Generation with Large Language Models