Google 학술 검색

W **ong, Z **ong, Y Zhang, Y Cui… - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

The content-based remote sensing image retrieval (CBRSIR) has recently become a hot
topic due to its wide applications in analysis of remote sensing data. However, since …

저장 인용 59회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Conditioned source separation for musical instrument performances

O Slizovskaia, G Haro, E Gómez - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

In music source separation, the number of sources may vary for each piece and some of the
sources may belong to the same family of instruments, thus sharing timbral characteristics …

저장 인용 45회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Less can be more: Sound source localization with a classification model

A Senocak, H Ryu, J Kim… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

In this paper, we tackle sound localization as a natural outcome of the audio-visual video
classification problem. Differently from the existing sound localization approaches, we do not …

저장 인용 27회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Large scale audiovisual learning of sounds with weakly labeled data

HM Fayek, A Kumar - arxiv preprint arxiv:2006.01595, 2020 - arxiv.org

Recognizing sounds is a key aspect of computational audio scene analysis and machine
perception. In this paper, we advocate that sound recognition is inherently a multi-modal …

저장 인용 42회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-modal music-video recommendation: A study of design choices

L Prétet, G Richard, G Peeters - 2021 International Joint …, 2021 - ieeexplore.ieee.org

In this work, we study music/video cross-modal recommendation, ie recommending a music
track for a video or vice versa. We rely on a self-supervised learning paradigm to learn from …

저장 인용 27회 인용 관련 학술자료 전체 10개의 버전

SSLNet: A network for cross-modal sound source localization in visual scenes

F Feng, Y Ming, N Hu - Neurocomputing, 2022 - Elsevier

Sound source localization in visual scenes is to associate sounds and their visual
producers. Although great progress has been made in this field, the mixed sounds from …

저장 인용 9회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tribert: Full-body human-centric audio-visual representation learning for visual sound separation

T Rahman, M Yang, L Sigal - arxiv preprint arxiv:2110.13412, 2021 - arxiv.org

The recent success of transformer models in language, such as BERT, has motivated the
use of such architectures for multi-modal feature learning and tasks. However, most multi …

저장 인용 13회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

Unsupervised synthetic acoustic image generation for audio-visual scene understanding

V Sanguineti, P Morerio, A Del Bue… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Acoustic images are an emergent data modality for multimodal scene understanding. Such
images have the peculiarity of distinguishing the spectral signature of the sound coming …

저장 인용 7회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal Alignment and Fusion: A Survey

S Li, H Tang - arxiv preprint arxiv:2411.17040, 2024 - arxiv.org

This survey offers a comprehensive review of recent advancements in multimodal alignment
and fusion within machine learning, spurred by the growing diversity of data types such as …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

TriBERT: Human-centric audio-visual representation learning

T Rahman, M Yang, L Sigal - Advances in Neural …, 2021 - proceedings.neurips.cc

The recent success of transformer models in language, such as BERT, has motivated the
use of such architectures for multi-modal feature learning and tasks. However, most multi …

저장 인용 10회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Weakly supervised representation learning for audio-visual scene analysis

A deep cross-modality hashing network for SAR and optical remote sensing images retrieval

Conditioned source separation for musical instrument performances

Less can be more: Sound source localization with a classification model

Large scale audiovisual learning of sounds with weakly labeled data

Cross-modal music-video recommendation: A study of design choices

SSLNet: A network for cross-modal sound source localization in visual scenes

Tribert: Full-body human-centric audio-visual representation learning for visual sound separation

Unsupervised synthetic acoustic image generation for audio-visual scene understanding

Multimodal Alignment and Fusion: A Survey

TriBERT: Human-centric audio-visual representation learning