Sparks of large audio models: A survey and outlook
S Latif, M Shoukat, F Shamshad, M Usama… - ar** and refining large language
models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks …
models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks …
Faith and fate: Limits of transformers on compositionality
Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …
performance on tasks that demand intricate multi-step reasoning. Yet, these models …
Generative learning for nonlinear dynamics
W Gilpin - Nature Reviews Physics, 2024 - nature.com
Modern generative machine learning models are able to create realistic outputs far beyond
their training data, such as photorealistic artwork, accurate protein structures or …
their training data, such as photorealistic artwork, accurate protein structures or …
Towards revealing the mystery behind chain of thought: a theoretical perspective
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …
improve the performance of Large Language Models (LLMs), particularly when dealing with …
Transformers as statisticians: Provable in-context learning with in-context algorithm selection
Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
Representational strengths and limitations of transformers
Attention layers, as commonly used in transformers, form the backbone of modern deep
learning, yet there is no mathematical description of their benefits and deficiencies as …
learning, yet there is no mathematical description of their benefits and deficiencies as …
Hidden progress in deep learning: Sgd learns parities near the computational limit
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …
methods as we scale up datasets, model sizes, and training times. While there are some …
Looped transformers as programmable computers
We present a framework for using transformer networks as universal computers by
programming them with specific weights and placing them in a loop. Our input sequence …
programming them with specific weights and placing them in a loop. Our input sequence …
Trained transformers learn linear models in-context
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …