Retrieving multimodal information for augmented generation: A survey

R Zhao, H Chen, W Wang, F Jiao, XL Do, C Qin… - arxiv preprint arxiv …, 2023 - arxiv.org
As Large Language Models (LLMs) become popular, there emerged an important trend of
using multimodality to augment the LLMs' generation ability, which enables LLMs to better …

Auslan-daily: Australian sign language translation for daily communication and news

X Shen, S Yuan, H Sheng, H Du… - Advances in Neural …, 2024 - proceedings.neurips.cc
Sign language translation (SLT) aims to convert a continuous sign language video clip into a
spoken language. Considering different geographic regions generally have their own native …

Improving personalized explanation generation through visualization

S Geng, Z Fu, Y Ge, L Li, G De Melo… - Proceedings of the 60th …, 2022 - aclanthology.org
In modern recommender systems, there are usually comments or reviews from users that
justify their ratings for different items. Trained on such textual corpus, explainable …

Learning to imagine: Visually-augmented natural language generation

T Tang, Y Chen, Y Du, J Li, WX Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org
People often imagine relevant scenes to aid in the writing process. In this work, we aim to
utilize visual information for composition in the same manner as humans. We propose a …

Autograph: Enabling visual context via graph alignment in open domain multi-modal dialogue generation

D Zhao, D Han, Y Yuan, B Ning, M Li, Z He… - Proceedings of the 32nd …, 2024 - dl.acm.org
Open-domain multi-modal dialogue system heavily relies on visual information to generate
contextually relevant responses. The existing open-domain multi-modal dialog generation …

Zrigf: An innovative multimodal framework for zero-resource image-grounded dialogue generation

B Zhang, J Wang, H Ma, B Xu, H Lin - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Image-grounded dialogue systems benefit greatly from integrating visual information,
resulting in high-quality response generation. However, current models struggle to …

Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation

B Zhang, H Ma, J Ding, J Wang, B Xu, H Lin - Information Fusion, 2025 - Elsevier
Integrating multimodal knowledge into large language models (LLMs) represents a
significant advancement in dialogue generation capabilities. However, the effective …

Identifying untrustworthy samples: Data filtering for open-domain dialogues with bayesian optimization

L Shen, H Zhan, X Shen, H Chen, X Zhao… - Proceedings of the 30th …, 2021 - dl.acm.org
Being able to reply with a related, fluent, and informative response is an indispensable
requirement for building high-quality conversational agents. In order to generate better …

Think beyond words: Exploring context-relevant visual commonsense for diverse dialogue generation

Y Liu, L Li, B Zhang, Q Huang - Findings of the Association for …, 2022 - aclanthology.org
Commonsense knowledge has been widely considered for building intelligent open-domain
dialogue agents, aiming to generate meaningful and diverse responses. Previous works in …

Resee: Responding through seeing fine-grained visual knowledge in open-domain dialogue

H Tu, Y Li, F Mi, Z Yang - arxiv preprint arxiv:2305.13602, 2023 - arxiv.org
Incorporating visual knowledge into text-only dialogue systems has become a potential
direction to imitate the way humans think, imagine, and communicate. However, existing …