Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arxiv preprint arxiv:2307.16851, 2023 - arxiv.org
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

Learning to summarize with human feedback

N Stiennon, L Ouyang, J Wu… - Advances in …, 2020 - proceedings.neurips.cc
As language models become more powerful, training and evaluation are increasingly
bottlenecked by the data and metrics used for a particular task. For example, summarization …

Recursively summarizing books with human feedback

J Wu, L Ouyang, DM Ziegler, N Stiennon… - arxiv preprint arxiv …, 2021 - arxiv.org
A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …

Chain of hindsight aligns language models with feedback

H Liu, C Sferrazza, P Abbeel - arxiv preprint arxiv:2302.02676, 2023 - arxiv.org
Learning from human preferences is important for language models to match human needs
and to align with human and social values. Prior works have achieved remarkable …

Unsupervised evaluation of interactive dialog with DialoGPT

S Mehri, M Eskenazi - arxiv preprint arxiv:2006.12719, 2020 - arxiv.org
It is important to define meaningful and interpretable automatic evaluation metrics for open-
domain dialog research. Standard language generation metrics have been shown to be …

DynaEval: Unifying turn and dialogue level evaluation

C Zhang, Y Chen, LF D'Haro, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …

Hierarchical pre-training for sequence labelling in spoken dialog

E Chapuis, P Colombo, M Manica, M Labeau… - arxiv preprint arxiv …, 2020 - arxiv.org
Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key
component of spoken dialog systems. In this work, we propose a new approach to learn …

Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and reproducibility testing

A Belz, S Mille, DM Howcroft - 2020 - doras.dcu.ie
Current standards for designing and reporting human evaluations in NLP mean it is
generally unclear which evaluations are comparable and can be expected to yield similar …