- Academic Search

Grounded text-to-image synthesis with attention refocusing

Q Phung, S Ge, JB Huang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …

Save Cite Cited by 84 Related articles All 3 versions Free GPT-4 View as HTML

See Say and Segment: Teaching LMMs to Overcome False Premises

TH Wu, G Biamby, D Chan, L Dunlap… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-
vocabulary language grounding and segmentation but can suffer under false premises …

Save Cite Cited by 17 Related articles All 3 versions Free GPT-4 View as HTML

Branch-solve-merge improves large language model evaluation and generation

S Saha, O Levy, A Celikyilmaz, M Bansal… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) are frequently used for multi-faceted language generation
and evaluation tasks that involve satisfying intricate user constraints or taking into account …

Save Cite Cited by 59 Related articles All 3 versions Free GPT-4 View as HTML

Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation

J Cho, Y Hu, R Garg, P Anderson, R Krishna… - arxiv preprint arxiv …, 2023 - arxiv.org

Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …

Save Cite Cited by 65 Related articles All 3 versions Free GPT-4 View as HTML

Ranni: Taming text-to-image diffusion for accurate instruction following

Y Feng, B Gong, D Chen, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts
especially those with quantity object-attribute binding and multi-subject descriptions. In this …

Save Cite Cited by 30 Related articles All 3 versions Free GPT-4 View as HTML

Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

H Lin, A Zala, J Cho, M Bansal - arxiv preprint arxiv:2309.15091, 2023 - arxiv.org

Although recent text-to-video (T2V) generation methods have seen significant
advancements, most of these works focus on producing short video clips of a single event …

Save Cite Cited by 56 Related articles All 3 versions Free GPT-4 View as HTML

Evaluating and Improving Compositional Text-to-Visual Generation

B Li, Z Lin, D Pathak, J Li, Y Fei, K Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

While text-to-visual models now produce photo-realistic images and videos they struggle
with compositional text prompts involving attributes relationships and higher-order …

Save Cite Cited by 9 Related articles View as HTML

Rich human feedback for text-to-image generation

Y Liang, J He, G Li, P Li, A Klimovskiy… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen
have made significant progress in generating high-resolution images based on text …

Save Cite Cited by 60 Related articles All 3 versions Free GPT-4 View as HTML