Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vision-based traffic accident detection and anticipation: A survey
Traffic accident detection and anticipation is an obstinate road safety problem and
painstaking efforts have been devoted. With the rapid growth of video data, Vision-based …
painstaking efforts have been devoted. With the rapid growth of video data, Vision-based …
Lightgt: A light graph transformer for multimedia recommendation
Multimedia recommendation methods aim to discover the user preference on the multi-
modal information to enhance the collaborative filtering (CF) based recommender system …
modal information to enhance the collaborative filtering (CF) based recommender system …
Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …
language query. Existing techniques achieve such alignment by exploiting dense boundary …
Fewer steps, better performance: Efficient cross-modal clip trimming for video moment retrieval using language
Given an untrimmed video and a sentence query, video moment retrieval using language
(VMR) aims to locate a target query-relevant moment. Since the untrimmed video is …
(VMR) aims to locate a target query-relevant moment. Since the untrimmed video is …
Gradient-regulated meta-prompt learning for generalizable vision-language models
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …
Dual learning with dynamic knowledge distillation for partially relevant video retrieval
J Dong, M Zhang, Z Zhang, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Almost all previous text-to-video retrieval works assume that videos are pre-trimmed with
short durations. However, in practice, videos are generally untrimmed containing much …
short durations. However, in practice, videos are generally untrimmed containing much …
Unified generative and discriminative training for multi-modal large language models
Abstract In recent times, Vision-Language Models (VLMs) have been trained under two
predominant paradigms. Generative training has enabled Multimodal Large Language …
predominant paradigms. Generative training has enabled Multimodal Large Language …
Online distillation-enhanced multi-modal transformer for sequential recommendation
Multi-modal recommendation systems, which integrate diverse types of information, have
gained widespread attention in recent years. However, compared to traditional collaborative …
gained widespread attention in recent years. However, compared to traditional collaborative …
Not all inputs are valid: Towards open-set video moment retrieval using language
Video Moment Retrieval (VMR) targets to retrieve the specific moment corresponding to a
sentence query from an untrimmed video. Although recent respectable works have made …
sentence query from an untrimmed video. Although recent respectable works have made …
Redundancy-aware transformer for video question answering
This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically,
the current video encoders tend to holistically embed all video clues at different granularities …
the current video encoders tend to holistically embed all video clues at different granularities …