Vision+ language applications: A survey

Y Zhou, N Shimada - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Text-to-image generation has attracted significant interest from researchers and practitioners
in recent years due to its widespread and diverse applications across various industries …

Zero and few-shot semantic parsing with ambiguous inputs

E Stengel-Eskin, K Rawlins, B Van Durme - arxiv preprint arxiv …, 2023 - arxiv.org
Despite the ubiquity of ambiguity in natural language, it is often ignored or deliberately
removed in semantic parsing tasks, which generally assume that a given surface form has …

Bigvideo: A large-scale video subtitle translation dataset for multimodal machine translation

L Kang, L Huang, N Peng, P Zhu, Z Sun… - arxiv preprint arxiv …, 2023 - arxiv.org
We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of
multi-modality machine translation. Compared with the widely used How2 and VaTeX …

Enhancing Intent Understanding for Ambiguous Prompt: A Human-Machine Co-Adaption Strategy

Y He, J Wang, K Li, Y Wang, L Sun, J Yin… - Available at SSRN …, 2024 - papers.ssrn.com
Modern image generation systems have demonstrated the ability to produce realistic and
high-quality visuals. However, user prompts often contain ambiguities, making it challenging …

Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation

Y He, J Wang, K Li, Y Wang, L Sun, J Yin… - arxiv preprint arxiv …, 2025 - arxiv.org
Modern image generation systems can produce high-quality visuals, yet user prompts often
contain ambiguities, requiring multiple revisions. Existing methods struggle to address the …

Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding

Z Fan, X Li, K Nag, C Fang, T Biswas, J Xu… - … Proceedings of the ACM …, 2024 - dl.acm.org
The rapid evolution of text-to-image diffusion models has opened the door of generative AI,
enabling the translation of textual descriptions into visually compelling images with …

Scaling Dual Stage Image-Text Retrieval with Multimodal Large Language Models

ZY Wang, YF Wu - 2024 International Joint Conference on …, 2024 - ieeexplore.ieee.org
Multimodal large language models (MLLMs) constructed powerful potential through
generative training in many downstream tasks. However, a significant performance gap …

Modeling Meaning for Description and Interaction

E Stengel-Eskin - 2023 - jscholarship.library.jhu.edu
Abstract Language is a powerful tool for communication and coordination, allowing us to
share thoughts, ideas, and instructions with others. Accordingly, enabling people to …