Imagereward: Learning and evaluating human preferences for text-to-image generation

J Xu, X Liu, Y Wu, Y Tong, Q Li… - Advances in …, 2023 - proceedings.neurips.cc
We present a comprehensive solution to learn and improve text-to-image models from
human preference feedback. To begin with, we build ImageReward---the first general …

Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Aligning text-to-image models using human feedback

K Lee, H Liu, M Ryu, O Watkins, Y Du… - arxiv preprint arxiv …, 2023 - arxiv.org
Deep generative models have shown impressive results in text-to-image synthesis.
However, current text-to-image models often generate images that are inadequately aligned …

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

Learning to summarize with human feedback

N Stiennon, L Ouyang, J Wu… - Advances in …, 2020 - proceedings.neurips.cc
As language models become more powerful, training and evaluation are increasingly
bottlenecked by the data and metrics used for a particular task. For example, summarization …

Recursively summarizing books with human feedback

J Wu, L Ouyang, DM Ziegler, N Stiennon… - arxiv preprint arxiv …, 2021 - arxiv.org
A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …

Dress: Instructing large vision-language models to align and interact with humans via natural language feedback

Y Chen, K Sikka, M Cogswell, H Ji… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present DRESS a large vision language model (LVLM) that innovatively exploits Natural
Language feedback (NLF) from Large Language Models to enhance its alignment and …