Improving Video Moment Retrieval by Auxiliary Moment-Query Pairs with Hyper-Interaction

R Zeng, Y Zhuo, J Li, Y Yang, H Wu… - … on Circuits and …, 2024 - ieeexplore.ieee.org
Most existing video moment retrieval (VMR) benchmark datasets face a common issue of
sparse annotations-only a few moments being annotated. We argue that videos contain a …

PLOVAD: Prompting Vision-Language Models for Open Vocabulary Video Anomaly Detection

C Xu, K Xu, X Jiang, T Sun - … on Circuits and Systems for Video …, 2025 - ieeexplore.ieee.org
Video anomaly detection (VAD) confronts significant challenges arising from data scarcity in
real-world open scenarios, encompassing sparse annotations, labeling costs, and …

Open-Vocabulary Spatio-Temporal Action Detection

T Wu, S Ge, J Qin, G Wu, L Wang - arxiv preprint arxiv:2405.10832, 2024 - arxiv.org
Spatio-temporal action detection (STAD) is an important fine-grained video understanding
task. Current methods require box and label supervision for all action classes in advance …

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

Y Qi, H Li, Y Song, X Wu, J Luo - arxiv preprint arxiv:2412.08158, 2024 - arxiv.org
The exploration of various vision-language tasks, such as visual captioning, visual question
answering, and visual commonsense reasoning, is an important area in artificial intelligence …