ChatGPT for good? On opportunities and challenges of large language models for education

E Kasneci, K Seßler, S Küchemann, M Bannert… - Learning and individual …, 2023 - Elsevier
Large language models represent a significant advancement in the field of AI. The
underlying technology is key to further innovations and, despite critical views and even bans …

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

R Kamoi, Y Zhang, N Zhang, J Han… - Transactions of the …, 2024 - direct.mit.edu
Self-correction is an approach to improving responses from large language models (LLMs)
by refining the responses using LLMs during inference. Prior work has proposed various self …

Let's verify step by step

H Lightman, V Kosaraju, Y Burda, H Edwards… - arxiv preprint arxiv …, 2023 - arxiv.org
In recent years, large language models have greatly improved in their ability to perform
complex multi-step reasoning. However, even state-of-the-art models still regularly produce …

Making language models better reasoners with step-aware verifier

Y Li, Z Lin, S Zhang, Q Fu, B Chen… - Proceedings of the …, 2023 - aclanthology.org
Few-shot learning is a challenging task that requires language models to generalize from
limited examples. Large language models like GPT-3 and PaLM have made impressive …

Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct

H Luo, Q Sun, C Xu, P Zhao, J Lou, C Tao… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs), such as GPT-4, have shown remarkable performance in
natural language processing (NLP) tasks, including challenging mathematical reasoning …

Lever: Learning to verify language-to-code generation with execution

A Ni, S Iyer, D Radev, V Stoyanov… - International …, 2023 - proceedings.mlr.press
The advent of large language models trained on code (code LLMs) has led to significant
progress in language-to-code generation. State-of-the-art approaches in this area combine …

Chain-of-thought prompting elicits reasoning in large language models

J Wei, X Wang, D Schuurmans… - Advances in neural …, 2022 - proceedings.neurips.cc
We explore how generating a chain of thought---a series of intermediate reasoning steps---
significantly improves the ability of large language models to perform complex reasoning. In …

Deductive verification of chain-of-thought reasoning

Z Ling, Y Fang, X Li, Z Huang, M Lee… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Large Language Models (LLMs) significantly benefit from Chain-of-thought (CoT)
prompting in performing various reasoning tasks. While CoT allows models to produce more …

Training verifiers to solve math word problems

K Cobbe, V Kosaraju, M Bavarian, M Chen… - arxiv preprint arxiv …, 2021 - arxiv.org
State-of-the-art language models can match human performance on many tasks, but they
still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures …

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

W Huang, P Abbeel, D Pathak… - … conference on machine …, 2022 - proceedings.mlr.press
Can world knowledge learned by large language models (LLMs) be used to act in
interactive environments? In this paper, we investigate the possibility of grounding high-level …