Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X **a, G Neubig… - … on Computer Vision, 2024 - Springer
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Grounded text-to-image synthesis with attention refocusing

Q Phung, S Ge, JB Huang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …

See Say and Segment: Teaching LMMs to Overcome False Premises

TH Wu, G Biamby, D Chan, L Dunlap… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-
vocabulary language grounding and segmentation but can suffer under false premises …

Branch-solve-merge improves large language model evaluation and generation

S Saha, O Levy, A Celikyilmaz, M Bansal… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) are frequently used for multi-faceted language generation
and evaluation tasks that involve satisfying intricate user constraints or taking into account …

Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation

J Cho, Y Hu, R Garg, P Anderson, R Krishna… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …

Ranni: Taming text-to-image diffusion for accurate instruction following

Y Feng, B Gong, D Chen, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts
especially those with quantity object-attribute binding and multi-subject descriptions. In this …

Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

H Lin, A Zala, J Cho, M Bansal - arxiv preprint arxiv:2309.15091, 2023 - arxiv.org
Although recent text-to-video (T2V) generation methods have seen significant
advancements, most of these works focus on producing short video clips of a single event …

Evaluating and Improving Compositional Text-to-Visual Generation

B Li, Z Lin, D Pathak, J Li, Y Fei, K Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …

Rich human feedback for text-to-image generation

Y Liang, J He, G Li, P Li, A Klimovskiy… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen
have made significant progress in generating high-resolution images based on text …

Dreambench++: A human-aligned benchmark for personalized image generation

Y Peng, Y Cui, H Tang, Z Qi, R Dong, J Bai… - arxiv preprint arxiv …, 2024 - arxiv.org
Personalized image generation holds great promise in assisting humans in everyday work
and life due to its impressive function in creatively generating personalized content …