Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models

H Mittal, N Agarwal, SY Lo… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com
We introduce PlausiVL a large video-language model for anticipating action sequences that
are plausible in the real-world. While significant efforts have been made towards anticipating …

Anticipating Object State Changes

V Manousaki, K Bacharidis, F Gouidis… - arxiv preprint arxiv …, 2024‏ - arxiv.org
In this work, we introduce (a) the new problem of anticipating object state changes in images
and videos during procedural activities,(b) new curated annotation data for object state …

Human Action Anticipation: A Survey

B Lai, S Toyer, T Nagarajan, R Girdhar, S Zha… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Predicting future human behavior is an increasingly popular topic in computer vision, driven
by the interest in applications such as autonomous vehicles, digital assistants and human …

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

O Zatsarynna, E Bahrami, YA Farha… - arxiv preprint arxiv …, 2025‏ - arxiv.org
Our work addresses the problem of stochastic long-term dense anticipation. The goal of this
task is to predict actions and their durations several minutes into the future based on …

About Time: Advances, Challenges, and Outlooks of Action Understanding

A Stergiou, R Poppe - arxiv preprint arxiv:2411.15106, 2024‏ - arxiv.org
We have witnessed impressive advances in video action understanding. Increased dataset
sizes, variability, and computation availability have enabled leaps in performance and task …

: UNCERTAINTY GUIDED MULTIMODAL LARGE LANGUAGE MODEL MERGING

H Qu, X Zhao, J Peng, K Lee, B Dariush, T Chen‏ - openreview.net
Multimodal Large Language Models (MLLMs) have gained increasing popularity as a
promising framework for leveraging the strong language reasoning capabilities in the vision …