- Academic Search

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org

Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Uložit Citovat Počet citací tohoto článku: 63 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Causal reasoning meets visual representation learning: A prospective study

Y Liu, YS Wei, H Yan, GB Li, L Lin - Machine Intelligence Research, 2022 - Springer

Visual representation learning is ubiquitous in various real-world applications, including
visual comprehension, video understanding, multi-modal analysis, human-computer …

Uložit Citovat Počet citací tohoto článku: 49 Související články Všechny verze (počet: 7)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Semi-supervised and unsupervised deep visual learning: A survey

Y Chen, M Mancini, X Zhu… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

State-of-the-art deep learning models are often trained with a large amount of costly labeled
training data. However, requiring exhaustive manual annotations may degrade the model's …

Uložit Citovat Počet citací tohoto článku: 134 Související články Všechny verze (počet: 18)

Avoid-df: Audio-visual joint learning for detecting deepfake

W Yang, X Zhou, Z Chen, B Guo, Z Ba… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Recently, deepfakes have raised severe concerns about the authenticity of online media.
Prior works for deepfake detection have made many efforts to capture the intra-modal …

Uložit Citovat Počet citací tohoto článku: 106 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sound to visual scene generation by audio-to-visual latent alignment

K Sung-Bin, A Senocak, H Ha… - Proceedings of the …, 2023 - openaccess.thecvf.com

How does audio describe the world around us? In this paper, we propose a method for
generating an image of a scene from sound. Our method addresses the challenges of …

Uložit Citovat Počet citací tohoto článku: 35 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Audio-visual generalised zero-shot learning with cross-modal attention and language

OB Mercea, L Riesch, A Koepke… - Proceedings of the …, 2022 - openaccess.thecvf.com

Learning to classify video data from classes not included in the training data, ie video-based
zero-shot learning, is challenging. We conjecture that the natural alignment between the …

Uložit Citovat Počet citací tohoto článku: 63 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sound-guided semantic image manipulation

SH Lee, W Roh, W Byeon, SH Yoon… - Proceedings of the …, 2022 - openaccess.thecvf.com

The recent success of the generative model shows that leveraging the multi-modal
embedding space can manipulate an image using text information. However, manipulating …

Uložit Citovat Počet citací tohoto článku: 59 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Integrating language guidance into vision-based deep metric learning

K Roth, O Vinyals, Z Akata - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com

Abstract Deep Metric Learning (DML) proposes to learn metric spaces which encode
semantic similarities as embedding space distances. These spaces should be transferable …

Uložit Citovat Počet citací tohoto článku: 41 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes

Z Song, Y Wang, J Fan, T Tan, Z Zhang - arxiv preprint arxiv:2203.13412, 2022 - arxiv.org

Sound source localization in visual scenes aims to localize objects emitting the sound in a
given image. Recent works showing impressive localization performance typically rely on …

Uložit Citovat Počet citací tohoto článku: 45 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection

J Yu, J Liu, Y Cheng, R Feng, Y Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Weakly-supervised audio-visual violence detection aims to distinguish snippets containing
multimodal violence events with video-level labels. Many prior works perform audio-visual …

Uložit Citovat Počet citací tohoto článku: 40 Související články Všechny verze (počet: 4)

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Distilling audio-visual knowledge by compositional contrastive learning

Learning in audio-visual context: A review, analysis, and new perspective

Causal reasoning meets visual representation learning: A prospective study

Semi-supervised and unsupervised deep visual learning: A survey

Avoid-df: Audio-visual joint learning for detecting deepfake

Sound to visual scene generation by audio-to-visual latent alignment

Audio-visual generalised zero-shot learning with cross-modal attention and language

Sound-guided semantic image manipulation

Integrating language guidance into vision-based deep metric learning

Self-supervised predictive learning: A negative-free method for sound source localization in visual scenes

Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection