Shortcut learning in deep neural networks
Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …
today's machine intelligence. Numerous success stories have rapidly spread all over …
A simple llm framework for long-range video question-answering
We present LLoVi, a language-based framework for long-range video question-answering
(LVQA). Unlike prior long-range video understanding methods, which are often costly and …
(LVQA). Unlike prior long-range video understanding methods, which are often costly and …
Condensed movies: Story based retrieval with contextual embeddings
Our objective in this work is the long range understandingof the narrative structure of
movies. Instead of considering the entire movie, we propose to learn from thekey scenes' of …
movies. Instead of considering the entire movie, we propose to learn from thekey scenes' of …
AMEGO: Active Memory from long EGOcentric videos
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their
unstructured nature presents challenges for perception. In this paper, we introduce AMEGO …
unstructured nature presents challenges for perception. In this paper, we introduce AMEGO …
Learning to cut by watching movies
Video content creation keeps growing at an incredible pace; yet, creating engaging stories
remains challenging and requires non-trivial video editing expertise. Many video editing …
remains challenging and requires non-trivial video editing expertise. Many video editing …
Grounded multi-hop videoqa in long-form egocentric videos
This paper considers the problem of Multi-Hop Video Question Answering (MH-VidQA) in
long-form egocentric videos. This task not only requires to answer visual questions, but also …
long-form egocentric videos. This task not only requires to answer visual questions, but also …
HLVU: A new challenge to test deep understanding of movies the way humans do
In this paper we propose a new evaluation challenge and direction in the area of High-level
Video Understanding. The challenge we are proposing is designed to test automatic video …
Video Understanding. The challenge we are proposing is designed to test automatic video …
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Large language models such as GPT-3 have demonstrated an impressive capability to
adapt to new tasks without requiring task-specific training data. This capability has been …
adapt to new tasks without requiring task-specific training data. This capability has been …
Situation and behavior understanding by trope detection on films
The human ability of deep cognitive skills is crucial for the development of various real-world
applications that process diverse and abundant user generated input. While recent progress …
applications that process diverse and abundant user generated input. While recent progress …
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses
Video grounding is a fundamental problem in multimodal content understanding, aiming to
localize specific natural language queries in an untrimmed video. However, current video …
localize specific natural language queries in an untrimmed video. However, current video …