Moviechat: From dense token to sparse memory for long video understanding
Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …
Findings of the 2019 conference on machine translation (WMT19)
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …
Findings of the 2021 conference on machine translation (WMT21)
F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …
translation for Indo-European languages, the triangular translation task, and the automatic …
Is the reign of interactive search eternal? findings from the video browser showdown 2020
Comprehensive and fair performance evaluation of information retrieval systems represents
an essential task for the current information age. Whereas Cranfield-based evaluations with …
an essential task for the current information age. Whereas Cranfield-based evaluations with …
A comprehensive review of the video-to-text problem
Research in the Vision and Language area encompasses challenging topics that seek to
connect visual and textual information. When the visual information is related to videos, this …
connect visual and textual information. When the visual information is related to videos, this …
SEA: Sentence encoder assembly for video retrieval by textual queries
Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a
core theme in multimedia data management and retrieval. The success of AVS counts on …
core theme in multimedia data management and retrieval. The success of AVS counts on …
MultiVENT: Multilingual Videos of Events and Aligned Natural Text
Everyday news coverage has shifted from traditional broadcasts towards a wide range of
presentation formats such as first-hand, unedited video footage. Datasets that reflect the …
presentation formats such as first-hand, unedited video footage. Datasets that reflect the …
Considering human perception and memory in interactive multimedia retrieval evaluations
Experimental evaluations dealing with visual known-item search tasks, where real users
look for previously observed and memorized scenes in a given video collection, represent a …
look for previously observed and memorized scenes in a given video collection, represent a …
Face, body, voice: Video person-clustering with multiple modalities
The objective of this work is person-clustering in videos--grou** characters according to
their identity. Previous methods focus on the narrower task of face-clustering, and for the …
their identity. Previous methods focus on the narrower task of face-clustering, and for the …