Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

A survey on video moment localization

M Liu, L Nie, Y Wang, M Wang, Y Rui - ACM Computing Surveys, 2023 - dl.acm.org
Video moment localization, also known as video moment retrieval, aims to search a target
segment within a video described by a given natural language query. Beyond the task of …

Momentdiff: Generative video moment retrieval from random to real

P Li, CW **e, H **e, L Zhao, L Zhang… - Advances in neural …, 2023 - proceedings.neurips.cc
Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Knowing where to focus: Event-aware transformer for video grounding

J Jang, J Park, J Kim, H Kwon… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

Are binary annotations sufficient? video moment retrieval via hierarchical uncertainty-based active learning

W Ji, R Liang, Z Zheng, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on video moment retrieval has mostly focused on enhancing the
performance of accuracy, efficiency, and robustness, all of which largely rely on the …

Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding

WJ Moon, S Hyun, SB Lee, JP Heo - arxiv preprint arxiv:2311.08835, 2023 - arxiv.org
Temporal Grounding is to identify specific moments or highlights from a video corresponding
to textual descriptions. Typical approaches in temporal grounding treat all video clips …

Fewer steps, better performance: Efficient cross-modal clip trimming for video moment retrieval using language

X Fang, D Liu, W Fang, P Zhou, Z Xu, W Xu… - Proceedings of the …, 2024 - ojs.aaai.org
Given an untrimmed video and a sentence query, video moment retrieval using language
(VMR) aims to locate a target query-relevant moment. Since the untrimmed video is …

Hawkeye: Training video-text llms for grounding text in videos

Y Wang, X Meng, J Liang, Y Wang, Q Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Video-text Large Language Models (video-text LLMs) have shown remarkable performance
in answering questions and holding conversations on simple videos. However, they perform …

Reducing the vision and language bias for temporal sentence grounding

D Liu, X Qu, W Hu - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Temporal sentence grounding (TSG) is an important yet challenging task in multimedia
information retrieval. Although previous TSG methods have achieved decent performance …