Evaluating text-to-visual generation with image-to-text generation
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …
challenging because of the lack of effective metrics and standardized benchmarks. For …
Grounded text-to-image synthesis with attention refocusing
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …
synthesis methods have shown compelling results. However these models still fail to …
See Say and Segment: Teaching LMMs to Overcome False Premises
Abstract Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-
vocabulary language grounding and segmentation but can suffer under false premises …
vocabulary language grounding and segmentation but can suffer under false premises …
Branch-solve-merge improves large language model evaluation and generation
Large Language Models (LLMs) are frequently used for multi-faceted language generation
and evaluation tasks that involve satisfying intricate user constraints or taking into account …
and evaluation tasks that involve satisfying intricate user constraints or taking into account …
Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation
Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …
assessing text-image faithfulness is based on QG/A (question generation and answering) …
Ranni: Taming text-to-image diffusion for accurate instruction following
Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts
especially those with quantity object-attribute binding and multi-subject descriptions. In this …
especially those with quantity object-attribute binding and multi-subject descriptions. In this …
Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning
Although recent text-to-video (T2V) generation methods have seen significant
advancements, most of these works focus on producing short video clips of a single event …
advancements, most of these works focus on producing short video clips of a single event …
Evaluating and Improving Compositional Text-to-Visual Generation
While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …
with compositional text prompts involving attributes relationships and higher-order …
Rich human feedback for text-to-image generation
Abstract Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen
have made significant progress in generating high-resolution images based on text …
have made significant progress in generating high-resolution images based on text …
Dreambench++: A human-aligned benchmark for personalized image generation
Personalized image generation holds great promise in assisting humans in everyday work
and life due to its impressive function in creatively generating personalized content …
and life due to its impressive function in creatively generating personalized content …