Vision+ language applications: A survey
Text-to-image generation has attracted significant interest from researchers and practitioners
in recent years due to its widespread and diverse applications across various industries …
in recent years due to its widespread and diverse applications across various industries …
Zero and few-shot semantic parsing with ambiguous inputs
Despite the ubiquity of ambiguity in natural language, it is often ignored or deliberately
removed in semantic parsing tasks, which generally assume that a given surface form has …
removed in semantic parsing tasks, which generally assume that a given surface form has …
Bigvideo: A large-scale video subtitle translation dataset for multimodal machine translation
We present a large-scale video subtitle translation dataset, BigVideo, to facilitate the study of
multi-modality machine translation. Compared with the widely used How2 and VaTeX …
multi-modality machine translation. Compared with the widely used How2 and VaTeX …
Enhancing Intent Understanding for Ambiguous Prompt: A Human-Machine Co-Adaption Strategy
Modern image generation systems have demonstrated the ability to produce realistic and
high-quality visuals. However, user prompts often contain ambiguities, making it challenging …
high-quality visuals. However, user prompts often contain ambiguities, making it challenging …
Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation
Y He, J Wang, K Li, Y Wang, L Sun, J Yin… - arxiv preprint arxiv …, 2025 - arxiv.org
Modern image generation systems can produce high-quality visuals, yet user prompts often
contain ambiguities, requiring multiple revisions. Existing methods struggle to address the …
contain ambiguities, requiring multiple revisions. Existing methods struggle to address the …
Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding
The rapid evolution of text-to-image diffusion models has opened the door of generative AI,
enabling the translation of textual descriptions into visually compelling images with …
enabling the translation of textual descriptions into visually compelling images with …
Scaling Dual Stage Image-Text Retrieval with Multimodal Large Language Models
ZY Wang, YF Wu - 2024 International Joint Conference on …, 2024 - ieeexplore.ieee.org
Multimodal large language models (MLLMs) constructed powerful potential through
generative training in many downstream tasks. However, a significant performance gap …
generative training in many downstream tasks. However, a significant performance gap …
Modeling Meaning for Description and Interaction
E Stengel-Eskin - 2023 - jscholarship.library.jhu.edu
Abstract Language is a powerful tool for communication and coordination, allowing us to
share thoughts, ideas, and instructions with others. Accordingly, enabling people to …
share thoughts, ideas, and instructions with others. Accordingly, enabling people to …