Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Causal reasoning meets visual representation learning: A prospective study
Visual representation learning is ubiquitous in various real-world applications, including
visual comprehension, video understanding, multi-modal analysis, human-computer …
visual comprehension, video understanding, multi-modal analysis, human-computer …
Semi-supervised and unsupervised deep visual learning: A survey
State-of-the-art deep learning models are often trained with a large amount of costly labeled
training data. However, requiring exhaustive manual annotations may degrade the model's …
training data. However, requiring exhaustive manual annotations may degrade the model's …
Avoid-df: Audio-visual joint learning for detecting deepfake
Recently, deepfakes have raised severe concerns about the authenticity of online media.
Prior works for deepfake detection have made many efforts to capture the intra-modal …
Prior works for deepfake detection have made many efforts to capture the intra-modal …
Sound to visual scene generation by audio-to-visual latent alignment
How does audio describe the world around us? In this paper, we propose a method for
generating an image of a scene from sound. Our method addresses the challenges of …
generating an image of a scene from sound. Our method addresses the challenges of …
Audio-visual generalised zero-shot learning with cross-modal attention and language
Learning to classify video data from classes not included in the training data, ie video-based
zero-shot learning, is challenging. We conjecture that the natural alignment between the …
zero-shot learning, is challenging. We conjecture that the natural alignment between the …
Sound-guided semantic image manipulation
The recent success of the generative model shows that leveraging the multi-modal
embedding space can manipulate an image using text information. However, manipulating …
embedding space can manipulate an image using text information. However, manipulating …
Integrating language guidance into vision-based deep metric learning
Abstract Deep Metric Learning (DML) proposes to learn metric spaces which encode
semantic similarities as embedding space distances. These spaces should be transferable …
semantic similarities as embedding space distances. These spaces should be transferable …
Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes
Sound source localization in visual scenes aims to localize objects emitting the sound in a
given image. Recent works showing impressive localization performance typically rely on …
given image. Recent works showing impressive localization performance typically rely on …
Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection
Weakly-supervised audio-visual violence detection aims to distinguish snippets containing
multimodal violence events with video-level labels. Many prior works perform audio-visual …
multimodal violence events with video-level labels. Many prior works perform audio-visual …