- Academic Search

Tallenna Viittaa Viittausten määrä 197 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

Towards revealing the mystery behind chain of thought: a theoretical perspective

G Feng, B Zhang, Y Gu, H Ye, D He… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …

Tallenna Viittaa Viittausten määrä 199 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Y Bai, F Chen, H Wang, C **ong… - Advances in neural …, 2023 - proceedings.neurips.cc

Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …

Tallenna Viittaa Viittausten määrä 197 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

Transformers learn to implement preconditioned gradient descent for in-context learning

K Ahn, X Cheng, H Daneshmand… - Advances in Neural …, 2023 - proceedings.neurips.cc

Several recent works demonstrate that transformers can implement algorithms like gradient
descent. By a careful construction of weights, these works show that multiple layers of …

Tallenna Viittaa Viittausten määrä 168 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arxiv preprint arxiv …, 2023 - arxiv.org

Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

Tallenna Viittaa Viittausten määrä 184 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

Transformers learn shortcuts to automata

B Liu, JT Ash, S Goel, A Krishnamurthy… - arxiv preprint arxiv …, 2022 - arxiv.org

Algorithmic reasoning requires capabilities which are most naturally understood through
recurrent models of computation, like the Turing machine. However, Transformer models …

Tallenna Viittaa Viittausten määrä 84 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[PDF] academia.edu

[PDF][PDF] Chain of thought empowers transformers to solve inherently serial problems

Z Li, H Liu, D Zhou, T Ma - arxiv preprint arxiv:2402.12875, 2024 - academia.edu

Instructing the model to generate a sequence of intermediate steps, aka, a chain of thought
(CoT), is a highly effective method to improve the accuracy of large language models (LLMs) …

Tallenna Viittaa Viittausten määrä 95 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

Teaching arithmetic to small transformers

N Lee, K Sreenivasan, JD Lee, K Lee… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models like GPT-4 exhibit emergent capabilities across general-purpose
tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks …

Tallenna Viittaa Viittausten määrä 77 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

In-context convergence of transformers

Y Huang, Y Cheng, Y Liang - arxiv preprint arxiv:2310.05249, 2023 - arxiv.org

Transformers have recently revolutionized many domains in modern machine learning and
one salient discovery is their remarkable in-context learning capability, where models can …