Retrieving multimodal information for augmented generation: A survey
As Large Language Models (LLMs) become popular, there emerged an important trend of
using multimodality to augment the LLMs' generation ability, which enables LLMs to better …
using multimodality to augment the LLMs' generation ability, which enables LLMs to better …
Auslan-daily: Australian sign language translation for daily communication and news
Sign language translation (SLT) aims to convert a continuous sign language video clip into a
spoken language. Considering different geographic regions generally have their own native …
spoken language. Considering different geographic regions generally have their own native …
Improving personalized explanation generation through visualization
In modern recommender systems, there are usually comments or reviews from users that
justify their ratings for different items. Trained on such textual corpus, explainable …
justify their ratings for different items. Trained on such textual corpus, explainable …
Learning to imagine: Visually-augmented natural language generation
People often imagine relevant scenes to aid in the writing process. In this work, we aim to
utilize visual information for composition in the same manner as humans. We propose a …
utilize visual information for composition in the same manner as humans. We propose a …
Autograph: Enabling visual context via graph alignment in open domain multi-modal dialogue generation
Open-domain multi-modal dialogue system heavily relies on visual information to generate
contextually relevant responses. The existing open-domain multi-modal dialog generation …
contextually relevant responses. The existing open-domain multi-modal dialog generation …
Zrigf: An innovative multimodal framework for zero-resource image-grounded dialogue generation
Image-grounded dialogue systems benefit greatly from integrating visual information,
resulting in high-quality response generation. However, current models struggle to …
resulting in high-quality response generation. However, current models struggle to …
Distilling implicit multimodal knowledge into large language models for zero-resource dialogue generation
Integrating multimodal knowledge into large language models (LLMs) represents a
significant advancement in dialogue generation capabilities. However, the effective …
significant advancement in dialogue generation capabilities. However, the effective …
Identifying untrustworthy samples: Data filtering for open-domain dialogues with bayesian optimization
Being able to reply with a related, fluent, and informative response is an indispensable
requirement for building high-quality conversational agents. In order to generate better …
requirement for building high-quality conversational agents. In order to generate better …
Think beyond words: Exploring context-relevant visual commonsense for diverse dialogue generation
Commonsense knowledge has been widely considered for building intelligent open-domain
dialogue agents, aiming to generate meaningful and diverse responses. Previous works in …
dialogue agents, aiming to generate meaningful and diverse responses. Previous works in …
Resee: Responding through seeing fine-grained visual knowledge in open-domain dialogue
Incorporating visual knowledge into text-only dialogue systems has become a potential
direction to imitate the way humans think, imagine, and communicate. However, existing …
direction to imitate the way humans think, imagine, and communicate. However, existing …