Challenges and applications of large language models
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …
Training language models to follow instructions with human feedback
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …
intent. For example, large language models can generate outputs that are untruthful, toxic, or …
Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …
encompassing various applications and research areas such as robustness, security …
Learning to summarize with human feedback
As language models become more powerful, training and evaluation are increasingly
bottlenecked by the data and metrics used for a particular task. For example, summarization …
bottlenecked by the data and metrics used for a particular task. For example, summarization …
Recursively summarizing books with human feedback
A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …
very difficult or time-consuming for humans to evaluate. We present progress on this …
Chain of hindsight aligns language models with feedback
Learning from human preferences is important for language models to match human needs
and to align with human and social values. Prior works have achieved remarkable …
and to align with human and social values. Prior works have achieved remarkable …
Unsupervised evaluation of interactive dialog with DialoGPT
S Mehri, M Eskenazi - arxiv preprint arxiv:2006.12719, 2020 - arxiv.org
It is important to define meaningful and interpretable automatic evaluation metrics for open-
domain dialog research. Standard language generation metrics have been shown to be …
domain dialog research. Standard language generation metrics have been shown to be …
DynaEval: Unifying turn and dialogue level evaluation
A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …
Hierarchical pre-training for sequence labelling in spoken dialog
Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key
component of spoken dialog systems. In this work, we propose a new approach to learn …
component of spoken dialog systems. In this work, we propose a new approach to learn …
Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and reproducibility testing
Current standards for designing and reporting human evaluations in NLP mean it is
generally unclear which evaluations are comparable and can be expected to yield similar …
generally unclear which evaluations are comparable and can be expected to yield similar …