Towards Multi-Modal Co-Reference Resolution in Conversational Shop** Agents

S Osebe, P Wanigasekara, T Gueudre… - Proceedings of the …, 2024 - aclanthology.org
The context of modern smart voice assistants is often multi-modal, where images, audio and
video content are consumed by users simultaneously. In such a setup, co-reference …

Visual Item Selection With Voice Assistants: A systems perspective

P Wanigasekara, R Al-Humaimidi, T Gojayev… - … Proceedings of the …, 2023 - dl.acm.org
Interacting with voice assistants, such as Amazon Alexa to aid in day-to-day tasks has
become a ubiquitous phenomenon in modern-day households. These voice assistants often …

[HTML][HTML] Adapting uni-modal language models for dense multi-modal co-reference resolution using parameter augmentation

S Osebe, P Wanigasekara, T Gueudre, T Tran - 2024 - amazon.science
The context of modern smart voice assistants are often multi-modal, where images, audio
and video content are consumed by users simultaneously. In such a setup, co-reference …