Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
Alleviating noisy pseudo labels remains a key challenge in Semi-Supervised Temporal
Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict …
Action Localization (SS-TAL). Existing methods often filter pseudo labels based on strict …
BiLL-VTG: Bridging Large Language Models and Lightweight Visual Tools for Video-based Texts Generation
J Qi, K Ji, J Yu, D Wang, B Xu, L Hou, J Li - ar**, R Basri… - The Thirty-eighth Annual … - openreview.net
The recent emergence of powerful Vision-Language models (VLMs) has significantly
improved image captioning. Some of these models are extended to caption videos as well …
improved image captioning. Some of these models are extended to caption videos as well …
Vidcap-Llm: Vision-Transformer and Large Language Model for Video Captioning with Linguistic Semantics Integration
Video captioning models produce textual descriptions based on content, emphasizing the
pivotal role of representation learning. Conventional methods are primarily designed within …
pivotal role of representation learning. Conventional methods are primarily designed within …