A survey of video datasets for grounded event understanding

K Sanders, B Van Durme - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
While existing video benchmarks largely consider specialized downstream tasks like
retrieval or question-answering (QA) contemporary multimodal AI systems must be capable …

Employing Glyphic Information for Chinese Event Extraction with Vision-Language Model

X Bao, J Gu, Z Wang, M Qiang… - Findings of the …, 2024 - aclanthology.org
As a complex task that requires rich information input, features from various aspects have
been utilized in event extraction. However, most of the previous works ignored the value of …

[PDF][PDF] MUMOSA, Interactive Dashboard for MUlti-MOdal Situation Awareness

S Lukin, S Bowser, R Suchocki… - Proceedings of the …, 2024 - aclanthology.org
Abstract Information extraction has led the way for event detection from text for many years.
Recent advances in neural models, such as Large Language Models (LLMs) and Vision …