Follow the rules: reasoning for video anomaly detection with large language models

Y Yang, K Lee, B Dariush, Y Cao, SY Lo - European Conference on …, 2024 - Springer
Abstract Video Anomaly Detection (VAD) is crucial for applications such as security
surveillance and autonomous driving. However, existing VAD methods provide little …

LoSA: long-short-range adapter for scaling end-to-end temporal action localization

A Gupta, G Mittal, A Magooda, Y Yu, GW Taylor… - arxiv preprint arxiv …, 2024 - arxiv.org
Temporal Action Localization (TAL) involves localizing and classifying action snippets in an
untrimmed video. The emergence of large video foundation models has led RGB-only video …

Anticipating Object State Changes

V Manousaki, K Bacharidis, F Gouidis… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work, we introduce (a) the new problem of anticipating object state changes in images
and videos during procedural activities,(b) new curated annotation data for object state …

ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

CH Kao, C Chien, YJ Tseng, YH Chen, A Gnutti… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents the first-ever study of adapting compressed image latents to suit the
needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs) …

TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction

K Takeyama, Y Liu, M Sra - arxiv preprint arxiv:2410.03993, 2024 - arxiv.org
Accurate prediction of human behavior is crucial for AI systems to effectively support real-
world applications, such as autonomous robots anticipating and assisting with human tasks …

Human Action Anticipation: A Survey

B Lai, S Toyer, T Nagarajan, R Girdhar, S Zha… - arxiv preprint arxiv …, 2024 - arxiv.org
Predicting future human behavior is an increasingly popular topic in computer vision, driven
by the interest in applications such as autonomous vehicles, digital assistants and human …

Exocentric To Egocentric Transfer For Action Recognition: A Short Survey

A Thatipelli, SY Lo, AK Roy-Chowdhury - arxiv preprint arxiv:2410.20621, 2024 - arxiv.org
Egocentric vision captures the scene from the point of view of the camera wearer while
exocentric vision captures the overall scene context. Jointly modeling ego and exo views is …

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

O Zatsarynna, E Bahrami, YA Farha… - arxiv preprint arxiv …, 2025 - arxiv.org
Our work addresses the problem of stochastic long-term dense anticipation. The goal of this
task is to predict actions and their durations several minutes into the future based on …

About Time: Advances, Challenges, and Outlooks of Action Understanding

A Stergiou, R Poppe - arxiv preprint arxiv:2411.15106, 2024 - arxiv.org
We have witnessed impressive advances in video action understanding. Increased dataset
sizes, variability, and computation availability have enabled leaps in performance and task …