Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Coot: Cooperative hierarchical transformer for video-text representation learning

S Ging, M Zolfaghari, H Pirsiavash… - Advances in neural …, 2020 - proceedings.neurips.cc
Many real-world video-text tasks involve different levels of granularity, such as frames and
words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …

Weakly supervised temporal sentence grounding with gaussian-based contrastive proposal learning

M Zheng, Y Huang, Q Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Temporal sentence grounding aims to detect the most salient moment corresponding to the
natural language query from untrimmed videos. As labeling the temporal boundaries is labor …

Counterfactual contrastive learning for weakly-supervised vision-language grounding

Z Zhang, Z Zhao, Z Lin, X He - Advances in Neural …, 2020 - proceedings.neurips.cc
Weakly-supervised vision-language grounding aims to localize a target moment in a video
or a specific region in an image according to the given sentence query, where only video …

Rethinking weakly-supervised video temporal grounding from a game perspective

X Fang, Z **ong, W Fang, X Qu, C Chen, J Dong… - … on Computer Vision, 2024 - Springer
This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …

Weakly supervised video moment localization with contrastive negative sample mining

M Zheng, Y Huang, Q Chen, Y Liu - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Video moment localization aims at localizing the video segments which are most related to
the given free-form natural language query. The weakly supervised setting, where only …

Weakly supervised temporal sentence grounding with uncertainty-guided self-training

Y Huang, L Yang, Y Sato - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
The task of weakly supervised temporal sentence grounding aims at finding the
corresponding temporal moments of a language description in the video, given video …

Cascaded prediction network via segment tree for temporal video grounding

Y Zhao, Z Zhao, Z Zhang, Z Lin - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Temporal video grounding aims to localize the target segment which is semantically aligned
with the given sentence in an untrimmed video. Existing methods can be divided into two …

Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning

S Chen, YG Jiang - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Abstract Dense Event Captioning (DEC) aims to jointly localize and describe multiple events
of interest in untrimmed videos, which is an advancement of the conventional video …

Learning video moment retrieval without a single annotated video

J Gao, C Xu - IEEE Transactions on Circuits and Systems for …, 2021 - ieeexplore.ieee.org
Video moment retrieval has progressed significantly over the past few years, aiming to
search the moment that is most relevant to a given natural language query. Most existing …