[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2024 - Elsevier
Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable performance across a wide
array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent …

The unlocking spell on base llms: Rethinking alignment via in-context learning

BY Lin, A Ravichander, X Lu, N Dziri, M Sclar… - arxiv preprint arxiv …, 2023 - arxiv.org
The alignment tuning process of large language models (LLMs) typically involves instruction
learning through supervised fine-tuning (SFT) and preference tuning via reinforcement …

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

A survey on knowledge distillation of large language models

X Xu, M Li, C Tao, T Shen, R Cheng, J Li, C Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a
pivotal methodology for transferring advanced capabilities from leading proprietary LLMs …

xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection

NM Guerreiro, R Rei, D Stigt, L Coheur… - Transactions of the …, 2024 - direct.mit.edu
Widely used learned metrics for machine translation evaluation, such as Comet and Bleurt,
estimate the quality of a translation hypothesis by providing a single sentence-level score …

Fine-grained human feedback gives better rewards for language model training

Z Wu, Y Hu, W Shi, N Dziri, A Suhr… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Language models (LMs) often exhibit undesirable text generation behaviors,
including generating false, toxic, or irrelevant outputs. Reinforcement learning from human …

Error analysis prompting enables human-like translation evaluation in large language models

Q Lu, B Qiu, L Ding, K Zhang, T Kocmi… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative large language models (LLMs), eg, ChatGPT, have demonstrated remarkable
proficiency across several NLP tasks, such as machine translation, text summarization …

Llm-based nlg evaluation: Current status and challenges

M Gao, X Hu, J Ruan, X Pu, X Wan - arxiv preprint arxiv:2402.01383, 2024 - arxiv.org
Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs

R Kamoi, Y Zhang, N Zhang, J Han… - Transactions of the …, 2024 - direct.mit.edu
Self-correction is an approach to improving responses from large language models (LLMs)
by refining the responses using LLMs during inference. Prior work has proposed various self …