- Academic Search

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Opslaan Citeren Geciteerd door 482 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]

[PDF] neurips.cc

Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc

Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Opslaan Citeren Geciteerd door 11721 Verwante artikelen Alle 18 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arxiv preprint arxiv:2307.16851, 2023 - arxiv.org

The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

Opslaan Citeren Geciteerd door 24 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]

[PDF] neurips.cc

Learning to summarize with human feedback

N Stiennon, L Ouyang, J Wu… - Advances in …, 2020 - proceedings.neurips.cc

As language models become more powerful, training and evaluation are increasingly
bottlenecked by the data and metrics used for a particular task. For example, summarization …

Opslaan Citeren Geciteerd door 1868 Verwante artikelen Alle 10 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Recursively summarizing books with human feedback

J Wu, L Ouyang, DM Ziegler, N Stiennon… - arxiv preprint arxiv …, 2021 - arxiv.org

A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …

Opslaan Citeren Geciteerd door 264 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Chain of hindsight aligns language models with feedback

H Liu, C Sferrazza, P Abbeel - arxiv preprint arxiv:2302.02676, 2023 - arxiv.org

Learning from human preferences is important for language models to match human needs
and to align with human and social values. Prior works have achieved remarkable …

Opslaan Citeren Geciteerd door 136 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Unsupervised evaluation of interactive dialog with DialoGPT

S Mehri, M Eskenazi - arxiv preprint arxiv:2006.12719, 2020 - arxiv.org

It is important to define meaningful and interpretable automatic evaluation metrics for open-
domain dialog research. Standard language generation metrics have been shown to be …

Opslaan Citeren Geciteerd door 179 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

DynaEval: Unifying turn and dialogue level evaluation

C Zhang, Y Chen, LF D'Haro, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …

Opslaan Citeren Geciteerd door 70 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Hierarchical pre-training for sequence labelling in spoken dialog

E Chapuis, P Colombo, M Manica, M Labeau… - arxiv preprint arxiv …, 2020 - arxiv.org

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key
component of spoken dialog systems. In this work, we propose a new approach to learn …

Opslaan Citeren Geciteerd door 77 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]

[PDF] dcu.ie

Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and reproducibility testing

A Belz, S Mille, DM Howcroft - 2020 - doras.dcu.ie

Current standards for designing and reporting human evaluations in NLP mean it is
generally unclear which evaluations are comparable and can be expected to yield similar …

Opslaan Citeren Geciteerd door 74 Verwante artikelen Alle 10 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Towards coherent and engaging spoken dialog response generation using automatic conversation...

Challenges and applications of large language models

Training language models to follow instructions with human feedback

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

Learning to summarize with human feedback

Recursively summarizing books with human feedback

Chain of hindsight aligns language models with feedback

Unsupervised evaluation of interactive dialog with DialoGPT

DynaEval: Unifying turn and dialogue level evaluation

Hierarchical pre-training for sequence labelling in spoken dialog

Disentangling the properties of human evaluation methods: A classification system to support comparability, meta-evaluation and reproducibility testing