- Academic Search

Generating action-conditioned prompts for open-vocabulary video action recognition

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Improving Video Moment Retrieval by Auxiliary Moment-Query Pairs with Hyper-Interaction

R Zeng, Y Zhuo, J Li, Y Yang, H Wu… - … on Circuits and …, 2024 - ieeexplore.ieee.org

Most existing video moment retrieval (VMR) benchmark datasets face a common issue of
sparse annotations-only a few moments being annotated. We argue that videos contain a …

Speichern Zitieren Ähnliche Artikel

PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection

C Xu, K Xu, X Jiang, T Sun - … on Circuits and Systems for Video …, 2025 - ieeexplore.ieee.org

Video anomaly detection (VAD) confronts significant challenges arising from data scarcity in
real-world open scenarios, encompassing sparse annotations, labeling costs, and …

Speichern Zitieren Ähnliche Artikel

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open-Vocabulary Spatio-Temporal Action Detection

T Wu, S Ge, J Qin, G Wu, L Wang - arxiv preprint arxiv:2405.10832, 2024 - arxiv.org

Spatio-temporal action detection (STAD) is an important fine-grained video understanding
task. Current methods require box and label supervision for all action classes in advance …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

Y Qi, H Li, Y Song, X Wu, J Luo - arxiv preprint arxiv:2412.08158, 2024 - arxiv.org

The exploration of various vision-language tasks, such as visual captioning, visual question
answering, and visual commonsense reasoning, is an important area in artificial intelligence …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Generating action-conditioned prompts for open-vocabulary video action recognition

Improving Video Moment Retrieval by Auxiliary Moment-Query Pairs with Hyper-Interaction

PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection

Open-Vocabulary Spatio-Temporal Action Detection

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey