Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Retrieval-augmented generation for ai-generated content: A survey
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …
advancements in model algorithms, scalable foundation model architectures, and the …
Locvtp: Video-text pre-training for temporal localization
Abstract Video-Text Pre-training (VTP) aims to learn transferable representations for various
downstream tasks from large-scale web videos. To date, almost all existing VTP methods …
downstream tasks from large-scale web videos. To date, almost all existing VTP methods …
Rap: Efficient text-video retrieval with sparse-and-correlated adapter
Text-Video Retrieval (TVR) aims to align relevant video content with natural language
queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning …
queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning …
[HTML][HTML] Style-aware two-stage learning framework for video captioning
Significant progress has been made in video captioning in recent years. However, most
existing methods directly learn from all given captions without distinguishing the styles of …
existing methods directly learn from all given captions without distinguishing the styles of …
Muse: Mamba is efficient multi-scale learner for text-video retrieval
Text-Video Retrieval (TVR) aims to align and associate relevant video content with
corresponding natural language queries. Most existing TVR methods are based on large …
corresponding natural language queries. Most existing TVR methods are based on large …
Exploiting auxiliary caption for video grounding
Video grounding aims to locate a moment of interest matching the given query sentence
from an untrimmed video. Previous works ignore the\emph {sparsity dilemma} in video …
from an untrimmed video. Previous works ignore the\emph {sparsity dilemma} in video …
Fintextqa: A dataset for long-form financial question answering
Accurate evaluation of financial question answering (QA) systems necessitates a
comprehensive dataset encompassing diverse question types and contexts. However …
comprehensive dataset encompassing diverse question types and contexts. However …
Embracing language inclusivity and diversity in CLIP through continual language learning
While vision-language pre-trained models (VL-PTMs) have advanced multimodal research
in recent years, their mastery in a few languages like English restricts their applicability in …
in recent years, their mastery in a few languages like English restricts their applicability in …
Physgame: Uncovering physical commonsense violations in gameplay videos
Recent advancements in video-based large language models (Video LLMs) have witnessed
the emergence of diverse capabilities to reason and interpret dynamic visual content …
the emergence of diverse capabilities to reason and interpret dynamic visual content …
Zero-shot temporal action detection by learning multimodal prompts and text-enhanced actionness
Zero-shot temporal action detection (ZS-TAD), aiming to recognize and detect new and
unseen video actions, is an emerging and challenging task with limited solutions. Recent …
unseen video actions, is an emerging and challenging task with limited solutions. Recent …