- Academic Search

S Latif, M Shoukat, F Shamshad, M Usama… - ar** and refining large language
models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks …

Zapisz Cytuj Cytowane przez 3579 Powiązane artykuły Wszystkie wersje 2 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Faith and fate: Limits of transformers on compositionality

N Dziri, X Lu, M Sclar, XL Li, L Jiang… - Advances in …, 2024 - proceedings.neurips.cc

Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …

Zapisz Cytuj Cytowane przez 323 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generative learning for nonlinear dynamics

W Gilpin - Nature Reviews Physics, 2024 - nature.com

Modern generative machine learning models are able to create realistic outputs far beyond
their training data, such as photorealistic artwork, accurate protein structures or …

Zapisz Cytuj Cytowane przez 23 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Towards revealing the mystery behind chain of thought: a theoretical perspective

G Feng, B Zhang, Y Gu, H Ye, D He… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …

Zapisz Cytuj Cytowane przez 187 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Y Bai, F Chen, H Wang, C **ong… - Advances in neural …, 2023 - proceedings.neurips.cc

Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …

Zapisz Cytuj Cytowane przez 195 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Representational strengths and limitations of transformers

C Sanford, DJ Hsu, M Telgarsky - Advances in Neural …, 2024 - proceedings.neurips.cc

Attention layers, as commonly used in transformers, form the backbone of modern deep
learning, yet there is no mathematical description of their benefits and deficiencies as …

Zapisz Cytuj Cytowane przez 78 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Hidden progress in deep learning: Sgd learns parities near the computational limit

B Barak, B Edelman, S Goel… - Advances in …, 2022 - proceedings.neurips.cc

There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …

Zapisz Cytuj Cytowane przez 142 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Looped transformers as programmable computers

A Giannou, S Rajput, J Sohn, K Lee… - International …, 2023 - proceedings.mlr.press

We present a framework for using transformer networks as universal computers by
programming them with specific weights and placing them in a loop. Our input sequence …

Zapisz Cytuj Cytowane przez 96 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - arxiv preprint arxiv:2306.09927, 2023 - arxiv.org

Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

Zapisz Cytuj Cytowane przez 146 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Transformers learn shortcuts to automata

Sparks of large audio models: A survey and outlook

Faith and fate: Limits of transformers on compositionality

Generative learning for nonlinear dynamics

Towards revealing the mystery behind chain of thought: a theoretical perspective

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Representational strengths and limitations of transformers

Hidden progress in deep learning: Sgd learns parities near the computational limit

Looped transformers as programmable computers

Trained transformers learn linear models in-context