When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Self-correction is an approach to improving responses from large language models (LLMs)
by refining the responses using LLMs during inference. Prior work has proposed various self …
by refining the responses using LLMs during inference. Prior work has proposed various self …
Next-generation database interfaces: A survey of llm-based text-to-sql
Generating accurate SQL from natural language questions (text-to-SQL) is a long-standing
challenge due to the complexities in user question understanding, database schema …
challenge due to the complexities in user question understanding, database schema …
Siren's song in the AI ocean: a survey on hallucination in large language models
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …
range of downstream tasks, a significant concern revolves around their propensity to exhibit …
Augmented language models: a survey
This survey reviews works in which language models (LMs) are augmented with reasoning
skills and the ability to use tools. The former is defined as decomposing a potentially …
skills and the ability to use tools. The former is defined as decomposing a potentially …
Large language models can be easily distracted by irrelevant context
Large language models have achieved impressive performance on various natural
language processing tasks. However, so far they have been evaluated primarily on …
language processing tasks. However, so far they have been evaluated primarily on …
Lever: Learning to verify language-to-code generation with execution
The advent of large language models trained on code (code LLMs) has led to significant
progress in language-to-code generation. State-of-the-art approaches in this area combine …
progress in language-to-code generation. State-of-the-art approaches in this area combine …
DS-1000: A natural and reliable benchmark for data science code generation
We introduce DS-1000, a code generation benchmark with a thousand data science
problems spanning seven Python libraries, such as Numpy and Pandas. Compared to prior …
problems spanning seven Python libraries, such as Numpy and Pandas. Compared to prior …
Language models are multilingual chain-of-thought reasoners
We evaluate the reasoning abilities of large language models in multilingual settings. We
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …
introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating …
Codet: Code generation with generated tests
The task of generating code solutions for a given programming problem can benefit from the
use of pre-trained language models such as Codex, which can produce multiple diverse …
use of pre-trained language models such as Codex, which can produce multiple diverse …
Ask me anything: A simple strategy for prompting language models
Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a
natural language prompt that demonstrates how to perform the task and no additional …
natural language prompt that demonstrates how to perform the task and no additional …