Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A deep cross-modality hashing network for SAR and optical remote sensing images retrieval
W **ong, Z **ong, Y Zhang, Y Cui… - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
The content-based remote sensing image retrieval (CBRSIR) has recently become a hot
topic due to its wide applications in analysis of remote sensing data. However, since …
topic due to its wide applications in analysis of remote sensing data. However, since …
Conditioned source separation for musical instrument performances
In music source separation, the number of sources may vary for each piece and some of the
sources may belong to the same family of instruments, thus sharing timbral characteristics …
sources may belong to the same family of instruments, thus sharing timbral characteristics …
Less can be more: Sound source localization with a classification model
In this paper, we tackle sound localization as a natural outcome of the audio-visual video
classification problem. Differently from the existing sound localization approaches, we do not …
classification problem. Differently from the existing sound localization approaches, we do not …
Large scale audiovisual learning of sounds with weakly labeled data
Recognizing sounds is a key aspect of computational audio scene analysis and machine
perception. In this paper, we advocate that sound recognition is inherently a multi-modal …
perception. In this paper, we advocate that sound recognition is inherently a multi-modal …
Cross-modal music-video recommendation: A study of design choices
In this work, we study music/video cross-modal recommendation, ie recommending a music
track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from …
track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from …
SSLNet: A network for cross-modal sound source localization in visual scenes
F Feng, Y Ming, N Hu - Neurocomputing, 2022 - Elsevier
Sound source localization in visual scenes is to associate sounds and their visual
producers. Although great progress has been made in this field, the mixed sounds from …
producers. Although great progress has been made in this field, the mixed sounds from …
Tribert: Full-body human-centric audio-visual representation learning for visual sound separation
The recent success of transformer models in language, such as BERT, has motivated the
use of such architectures for multi-modal feature learning and tasks. However, most multi …
use of such architectures for multi-modal feature learning and tasks. However, most multi …
Unsupervised synthetic acoustic image generation for audio-visual scene understanding
Acoustic images are an emergent data modality for multimodal scene understanding. Such
images have the peculiarity of distinguishing the spectral signature of the sound coming …
images have the peculiarity of distinguishing the spectral signature of the sound coming …
Multimodal Alignment and Fusion: A Survey
S Li, H Tang - arxiv preprint arxiv:2411.17040, 2024 - arxiv.org
This survey offers a comprehensive review of recent advancements in multimodal alignment
and fusion within machine learning, spurred by the growing diversity of data types such as …
and fusion within machine learning, spurred by the growing diversity of data types such as …
TriBERT: Human-centric audio-visual representation learning
The recent success of transformer models in language, such as BERT, has motivated the
use of such architectures for multi-modal feature learning and tasks. However, most multi …
use of such architectures for multi-modal feature learning and tasks. However, most multi …