Affective visual dialog: A large-scale benchmark for emotional reasoning based on visually grounded conversations

K Haydarov, X Shen, A Madasu, M Salem, LJ Li… - … on Computer Vision, 2024 - Springer
Abstract We introduce Affective Visual Dialog, an emotion explanation and reasoning task
as a testbed for research on understanding constructed emotions in response to visually …

OLViT: multi-modal state tracking via attention-based embeddings for video-grounded dialog

A Abdessaied, M von Hochmeister, A Bulling - arxiv preprint arxiv …, 2024 - arxiv.org
We present the Object Language Video Transformer (OLViT)-a novel model for video dialog
operating over a multi-modal attention-based dialog state tracker. Existing video dialog …