Shortcut learning in deep neural networks

R Geirhos, JH Jacobsen, C Michaelis… - Nature Machine …, 2020 - nature.com
Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …

A simple llm framework for long-range video question-answering

C Zhang, T Lu, MM Islam, Z Wang, S Yu… - arxiv preprint arxiv …, 2023 - arxiv.org
We present LLoVi, a language-based framework for long-range video question-answering
(LVQA). Unlike prior long-range video understanding methods, which are often costly and …

Condensed movies: Story based retrieval with contextual embeddings

M Bain, A Nagrani, A Brown… - Proceedings of the …, 2020 - openaccess.thecvf.com
Our objective in this work is the long range understandingof the narrative structure of
movies. Instead of considering the entire movie, we propose to learn from thekey scenes' of …

AMEGO: Active Memory from long EGOcentric videos

G Goletto, T Nagarajan, G Averta, D Damen - European Conference on …, 2024 - Springer
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their
unstructured nature presents challenges for perception. In this paper, we introduce AMEGO …

Learning to cut by watching movies

A Pardo, F Caba, JL Alcázar… - Proceedings of the …, 2021 - openaccess.thecvf.com
Video content creation keeps growing at an incredible pace; yet, creating engaging stories
remains challenging and requires non-trivial video editing expertise. Many video editing …

Grounded multi-hop videoqa in long-form egocentric videos

Q Chen, S Di, W **e - arxiv preprint arxiv:2408.14469, 2024 - arxiv.org
This paper considers the problem of Multi-Hop Video Question Answering (MH-VidQA) in
long-form egocentric videos. This task not only requires to answer visual questions, but also …

HLVU: A new challenge to test deep understanding of movies the way humans do

K Curtis, G Awad, S Rajput, I Soboroff - Proceedings of the 2020 …, 2020 - dl.acm.org
In this paper we propose a new evaluation challenge and direction in the area of High-level
Video Understanding. The challenge we are proposing is designed to test automatic video …

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

J Chung, Y Yu - arxiv preprint arxiv:2311.01233, 2023 - arxiv.org
Large language models such as GPT-3 have demonstrated an impressive capability to
adapt to new tasks without requiring task-specific training data. This capability has been …

Situation and behavior understanding by trope detection on films

CH Chang, HT Su, JH Hsu, YS Wang… - Proceedings of the Web …, 2021 - dl.acm.org
The human ability of deep cognitive skills is crucial for the development of various real-world
applications that process diverse and abundant user generated input. While recent progress …

SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

C Tan, Z Lin, J Pu, Z Qi, WY Pei, Z Qu, Y Wang… - Proceedings of the …, 2024 - dl.acm.org
Video grounding is a fundamental problem in multimodal content understanding, aiming to
localize specific natural language queries in an untrimmed video. However, current video …