Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting

M Turpin, J Michael, E Perez… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Large Language Models (LLMs) can achieve strong performance on many tasks by
producing step-by-step reasoning before giving a final output, often referred to as chain-of …

Using large language models to simulate multiple humans and replicate human subject studies

GV Aher, RI Arriaga, AT Kalai - International Conference on …, 2023 - proceedings.mlr.press
We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what
extent a given language model, such as GPT models, can simulate different aspects of …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - … 2023 Workshop on …, 2023 - openreview.net
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

Prompting is programming: A query language for large language models

L Beurer-Kellner, M Fischer, M Vechev - Proceedings of the ACM on …, 2023 - dl.acm.org
Large language models have demonstrated outstanding performance on a wide range of
tasks such as question answering and code generation. On a high level, given an input, a …

Symbol tuning improves in-context learning in language models

J Wei, L Hou, A Lampinen, X Chen, D Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present symbol tuning-finetuning language models on in-context input-label pairs where
natural language labels (eg," positive/negative sentiment") are replaced with arbitrary …

Inverse scaling: When bigger isn't better

IR McKenzie, A Lyzhov, M Pieler, A Parrish… - arxiv preprint arxiv …, 2023 - arxiv.org
Work on scaling laws has found that large language models (LMs) show predictable
improvements to overall loss with increased scale (model size, training data, and compute) …

Artificial intelligence supporting independent student learning: An evaluative case study of ChatGPT and learning to code

K Hartley, M Hayak, UH Ko - Education Sciences, 2024 - mdpi.com
Artificial intelligence (AI) tools like ChatGPT demonstrate the potential to support
personalized and adaptive learning experiences. This study explores how ChatGPT can …

Glore: When, where, and how to improve llm reasoning via global and local refinements

A Havrilla, S Raparthy, C Nalmpantis… - arxiv preprint arxiv …, 2024 - arxiv.org
State-of-the-art language models can exhibit impressive reasoning refinement capabilities
on math, science or coding tasks. However, recent work demonstrates that even the best …

A close look into the calibration of pre-trained language models

Y Chen, L Yuan, G Cui, Z Liu, H Ji - arxiv preprint arxiv:2211.00151, 2022 - arxiv.org
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive
uncertainty. We take a close look into this problem, aiming to answer two questions:(1) Do …