Large language models are zero-shot time series forecasters

N Gruver, M Finzi, S Qiu… - Advances in Neural …, 2023‏ - proceedings.neurips.cc
By encoding time series as a string of numerical digits, we can frame time series forecasting
as next-token prediction in text. Develo** this approach, we find that large language …

Faith and fate: Limits of transformers on compositionality

N Dziri, X Lu, M Sclar, XL Li, L Jiang… - Advances in …, 2023‏ - proceedings.neurips.cc
Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

C Burns, P Izmailov, JH Kirchner, B Baker… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

What can transformers learn in-context? a case study of simple function classes

S Garg, D Tsipras, PS Liang… - Advances in Neural …, 2022‏ - proceedings.neurips.cc
In-context learning is the ability of a model to condition on a prompt sequence consisting of
in-context examples (input-output pairs corresponding to some task) along with a new query …

Least-to-most prompting enables complex reasoning in large language models

D Zhou, N Schärli, L Hou, J Wei, N Scales… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Chain-of-thought prompting has demonstrated remarkable performance on various natural
language reasoning tasks. However, it tends to perform poorly on tasks which requires …

Exploring length generalization in large language models

C Anil, Y Wu, A Andreassen… - Advances in …, 2022‏ - proceedings.neurips.cc
The ability to extrapolate from short problem instances to longer ones is an important form of
out-of-distribution generalization in reasoning tasks, and is crucial when learning from …

Transformers learn shortcuts to automata

B Liu, JT Ash, S Goel, A Krishnamurthy… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Algorithmic reasoning requires capabilities which are most naturally understood through
recurrent models of computation, like the Turing machine. However, Transformer models …

Combinatorial optimization and reasoning with graph neural networks

Q Cappart, D Chételat, EB Khalil, A Lodi… - Journal of Machine …, 2023‏ - jmlr.org
Combinatorial optimization is a well-established area in operations research and computer
science. Until recently, its methods have focused on solving problem instances in isolation …

Easy-to-hard generalization: Scalable alignment beyond human supervision

Z Sun, L Yu, Y Shen, W Liu, Y Yang, S Welleck… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Current AI alignment methodologies rely on human-provided demonstrations or judgments,
and the learned capabilities of AI systems would be upper-bounded by human capabilities …

Transformers can achieve length generalization but not robustly

Y Zhou, U Alon, X Chen, X Wang, R Agarwal… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Length generalization, defined as the ability to extrapolate from shorter training sequences
to longer test ones, is a significant challenge for language models. This issue persists even …