Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Understanding the expressive power and mechanisms of transformer for sequence modeling
M Wang - arxiv preprint arxiv:2402.00522, 2024 - arxiv.org
We conduct a systematic study of the approximation properties of Transformer for sequence
modeling with long, sparse and complicated memory. We investigate the mechanisms …
modeling with long, sparse and complicated memory. We investigate the mechanisms …
[PDF][PDF] Initialization is critical to whether transformers fit composite functions by inference or memorizing
Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …
performance on compositional problems remains a topic of debate. In this work, we …
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models
Z Wang, Y Wang, Z Zhang, Z Zhou, H **, T Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models have consistently struggled with complex reasoning tasks, such as
mathematical problem-solving. Investigating the internal reasoning mechanisms of these …
mathematical problem-solving. Investigating the internal reasoning mechanisms of these …
Anchor Attention, Small Cache: Code Generation with Large Language Models
The development of large language models (LLMs) has revolutionized automated code
generation. However, their high demand of computation resources has hindered a broader …
generation. However, their high demand of computation resources has hindered a broader …