Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Temporal sentence grounding in videos: A survey and future directions
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …
Knowing where to focus: Event-aware transformer for video grounding
Recent DETR-based video grounding models have made the model directly predict moment
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
timestamps without any hand-crafted components, such as a pre-defined proposal or non …
You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …
moment semantically according to a sentence query. Although previous respectable works …
Towards generalisable video moment retrieval: Visual-dynamic injection to image-text pre-training
The correlation between the vision and text is essential for video moment retrieval (VMR),
however, existing methods heavily rely on separate pre-training feature extractors for visual …
however, existing methods heavily rely on separate pre-training feature extractors for visual …
Rethinking weakly-supervised video temporal grounding from a game perspective
This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …
Existing approaches are generally based on the moment proposal selection framework that …
Fewer steps, better performance: Efficient cross-modal clip trimming for video moment retrieval using language
Given an untrimmed video and a sentence query, video moment retrieval using language
(VMR) aims to locate a target query-relevant moment. Since the untrimmed video is …
(VMR) aims to locate a target query-relevant moment. Since the untrimmed video is …
Hawkeye: Training video-text llms for grounding text in videos
Video-text Large Language Models (video-text LLMs) have shown remarkable performance
in answering questions and holding conversations on simple videos. However, they perform …
in answering questions and holding conversations on simple videos. However, they perform …
Autoeval-video: An automatic benchmark for assessing large vision language models in open-ended video question answering
We propose a novel and challenging benchmark, AutoEval-Video, to comprehensively
evaluate large vision-language models in open-ended video question answering. The …
evaluate large vision-language models in open-ended video question answering. The …
Dual learning with dynamic knowledge distillation for partially relevant video retrieval
J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …
short durations. However, in practice, videos are generally untrimmed containing much …
Partial annotation-based video moment retrieval via iterative learning
Given a descriptive language query, Video Moment Retrieval (VMR) aims to seek the
corresponding semantic-consistent moment clip in the video, which is represented as a pair …
corresponding semantic-consistent moment clip in the video, which is represented as a pair …