Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation

SH Lee, Y Li, J Ke, I Yoo, H Zhang, J Yu… - … on Computer Vision, 2024 - Springer
Recent works have demonstrated that using reinforcement learning (RL) with multiple
quality rewards can improve the quality of generated images in text-to-image (T2I) …

Preference tuning with human feedback on language, speech, and vision tasks: A survey

GI Winata, H Zhao, A Das, W Tang, DD Yao… - arxiv preprint arxiv …, 2024 - arxiv.org
Preference tuning is a crucial process for aligning deep generative models with human
preferences. This survey offers a thorough overview of recent advancements in preference …

Scalable ranked preference optimization for text-to-image generation

S Karthik, H Coskun, Z Akata, S Tulyakov, J Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-
image (T2I) models with human feedback. Unfortunately, successful application of DPO to …

Itercomp: Iterative composition-aware feedback learning from model gallery for text-to-image generation

X Zhang, L Yang, G Li, Y Cai, J **e, Y Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made notable
strides in compositional text-to-image generation. However, these methods typically exhibit …

Comfygen: Prompt-adaptive workflows for text-to-image generation

R Gal, A Haviv, Y Alaluf, AH Bermano… - arxiv preprint arxiv …, 2024 - arxiv.org
The practical use of text-to-image generation has evolved from simple, monolithic models to
complex workflows that combine multiple specialized components. While workflow-based …

Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning

R Barceló, C Alcázar, F Tobar - arxiv preprint arxiv:2410.08315, 2024 - arxiv.org
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for
aligning to downstream objectives. In the case of diffusion models (DMs), though RL training …

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

L Eyring, S Karthik, K Roth, A Dosovitskiy… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-Image (T2I) models have made significant advancements in recent years, but they
still struggle to accurately capture intricate details specified in complex compositional …

Aligning Few-Step Diffusion Models with Dense Reward Difference Learning

Z Zhang, L Shen, S Zhang, D Ye, Y Luo, M Shi… - arxiv preprint arxiv …, 2024 - arxiv.org
Aligning diffusion models with downstream objectives is essential for their practical
applications. However, standard alignment methods often struggle with step generalization …

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

K Lee, X Li, Q Wang, J He, J Ke, MH Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
Aligning text-to-image (T2I) diffusion models with preference optimization is valuable for
human-annotated datasets, but the heavy cost of manual data collection limits scalability …

Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward

Z Jia, Y Nan, H Zhao, G Liu - arxiv preprint arxiv:2411.15247, 2024 - arxiv.org
Recent research has shown that fine-tuning diffusion models (DMs) with arbitrary rewards,
including non-differentiable ones, is feasible with reinforcement learning (RL) techniques …