Image captioning by diffusion models: a survey

F Daneshfar, A Bartani, P Lotfi - Engineering Applications of Artificial …, 2024 - Elsevier
Diffusion models are increasingly favored over traditional approaches like generative
adversarial networks (GANs) and auto-regressive transformers due to their remarkable …

Llms meet multimodal generation and editing: A survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

An empirical study and analysis of text-to-image generation using large language model-powered textual representation

Z Tan, M Yang, L Qin, H Yang, Y Qian, Q Zhou… - … on Computer Vision, 2024 - Springer
One critical prerequisite for faithful text-to-image generation is the accurate understanding of
text inputs. Existing methods leverage the text encoder of the CLIP model to represent input …

Diffusion models for intelligent transportation systems: A survey

M Peng, K Chen, X Guo, Q Zhang, H Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Intelligent Transportation Systems (ITS) are vital in modern traffic management and
optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models …

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

D Wang, J Cui, M Li, W Lin, B Chen… - European Conference on …, 2024 - Springer
As the open community of large language models (LLMs) matures, multimodal LLMs
(MLLMs) have promised an elegant bridge between vision and language. However, current …

PSCon: Toward Conversational Product Search

J Zou, M Aliannejadi, E Kanoulas, S Han, H Ma… - arxiv preprint arxiv …, 2025 - arxiv.org
Conversational Product Search (CPS) is confined to simulated conversations due to the lack
of real-world CPS datasets that reflect human-like language. Additionally, current …

Natural Language but Omitted? On the Ineffectiveness of Large Language Models' privacy policy from End-users' Perspective

S Zhang, H **ng, X Yi, H Li - arxiv preprint arxiv:2406.18100, 2024 - arxiv.org
LLMs driven products were increasingly prevalent in our daily lives, With a natural language
based interaction style, people may potentially leak their personal private information. Thus …

Integrating Large Language Models and Diffusion Models in Generative AI Tasks: Progress, Challenges, and Future Directions

B Benjdira, AM Ali, W Boulila, A Koubaa - Authorea Preprints, 2025 - techrxiv.org
Despite the rapid advancements in generative AI, the integration of Large Language Models
(LLMs) with diffusion models remains an underexplored domain, with significant potential for …