Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Diffusionret: Generative text-video retrieval with diffusion model
Existing text-video retrieval solutions are, in essence, discriminant models focused on
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …
Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs
Most text-driven human motion generation methods employ sequential modeling
approaches, eg, transformer, to extract sentence-level text representations automatically and …
approaches, eg, transformer, to extract sentence-level text representations automatically and …
Uatvr: Uncertainty-adaptive text-video retrieval
With the explosive growth of web videos and emerging large-scale vision-language pre-
training models, eg, CLIP, retrieving videos of interest with text instructions has attracted …
training models, eg, CLIP, retrieving videos of interest with text instructions has attracted …
Freestyleret: retrieving images from style-diversified queries
Image Retrieval aims to retrieve corresponding images based on a given query. In
application scenarios, users intend to express their retrieval intent through various query …
application scenarios, users intend to express their retrieval intent through various query …
Discover and align taxonomic context priors for open-world semi-supervised learning
Abstract Open-world Semi-Supervised Learning (OSSL) is a realistic and challenging task,
aiming to classify unlabeled samples from both seen and novel classes using partially …
aiming to classify unlabeled samples from both seen and novel classes using partially …
Hierarchical banzhaf interaction for general video-language representation learning
Multimodal representation learning, with contrastive learning, plays an important role in the
artificial intelligence domain. As an important subfield, video-language representation …
artificial intelligence domain. As an important subfield, video-language representation …
Weakly supervised gaussian contrastive grounding with large multimodal models for video question answering
Video Question Answering (VideoQA) aims to answer natural language questions based on
the information observed in videos. Despite the recent success of Large Multimodal Models …
the information observed in videos. Despite the recent success of Large Multimodal Models …
Cooperative game modeling with weighted token-level alignment for audio-text retrieval
Previous audio-text retrieval (ATR) methods primarily concentrate on constructing
contrastive pairs between entire audio clips and full caption sentences, while neglecting fine …
contrastive pairs between entire audio clips and full caption sentences, while neglecting fine …
Toward 3d spatial reasoning for human-like text-based visual question answering
Text-based Visual Question Answering~(TextVQA) aims to produce correct answers for
given questions about the images with multiple scene texts. In most cases, the texts naturally …
given questions about the images with multiple scene texts. In most cases, the texts naturally …
Single-shot pruning and quantization for hardware-friendly neural network acceleration
B Jiang, J Chen, Y Liu - Engineering Applications of Artificial Intelligence, 2023 - Elsevier
Applying CNN on embedded systems is challenging due to model size limitations. Pruning
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …