- Academic Search

Y Yang, K Lee, B Dariush, Y Cao, SY Lo - European Conference on …, 2024 - Springer

Abstract Video Anomaly Detection (VAD) is crucial for applications such as security
surveillance and autonomous driving. However, existing VAD methods provide little …

Speichern Zitieren Zitiert von: 10 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Stimuvar: Spatiotemporal stimuli-aware video affective reasoning with multimodal large language models

Y Guo, F Siddiqui, Y Zhao, R Chellappa… - ar**
socially intelligent systems. Although Multimodal Large Language Models (MLLMs) have …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LoSA: long-short-range adapter for scaling end-to-end temporal action localization

A Gupta, G Mittal, A Magooda, Y Yu, GW Taylor… - arxiv preprint arxiv …, 2024 - arxiv.org

Temporal Action Localization (TAL) involves localizing and classifying action snippets in an
untrimmed video. The emergence of large video foundation models has led RGB-only video …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Anticipating Object State Changes

V Manousaki, K Bacharidis, F Gouidis… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce (a) the new problem of anticipating object state changes in images
and videos during procedural activities,(b) new curated annotation data for object state …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

CH Kao, C Chien, YJ Tseng, YH Chen, A Gnutti… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper presents the first-ever study of adapting compressed image latents to suit the
needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs) …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction

K Takeyama, Y Liu, M Sra - arxiv preprint arxiv:2410.03993, 2024 - arxiv.org

Accurate prediction of human behavior is crucial for AI systems to effectively support real-
world applications, such as autonomous robots anticipating and assisting with human tasks …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Human Action Anticipation: A Survey

B Lai, S Toyer, T Nagarajan, R Girdhar, S Zha… - arxiv preprint arxiv …, 2024 - arxiv.org

Predicting future human behavior is an increasingly popular topic in computer vision, driven
by the interest in applications such as autonomous vehicles, digital assistants and human …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exocentric To Egocentric Transfer For Action Recognition: A Short Survey

A Thatipelli, SY Lo, AK Roy-Chowdhury - arxiv preprint arxiv:2410.20621, 2024 - arxiv.org

Egocentric vision captures the scene from the point of view of the camera wearer while
exocentric vision captures the overall scene context. Jointly modeling ego and exo views is …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

O Zatsarynna, E Bahrami, YA Farha… - arxiv preprint arxiv …, 2025 - arxiv.org

Our work addresses the problem of stochastic long-term dense anticipation. The goal of this
task is to predict actions and their durations several minutes into the future based on …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

About Time: Advances, Challenges, and Outlooks of Action Understanding

A Stergiou, R Poppe - arxiv preprint arxiv:2411.15106, 2024 - arxiv.org

We have witnessed impressive advances in video action understanding. Increased dataset
sizes, variability, and computation availability have enabled leaps in performance and task …

Speichern Zitieren Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large...

Follow the rules: reasoning for video anomaly detection with large language models

Stimuvar: Spatiotemporal stimuli-aware video affective reasoning with multimodal large language models

LoSA: long-short-range adapter for scaling end-to-end temporal action localization

Anticipating Object State Changes

ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck

TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction

Human Action Anticipation: A Survey

Exocentric To Egocentric Transfer For Action Recognition: A Short Survey

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

About Time: Advances, Challenges, and Outlooks of Action Understanding