Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Causal reasoning meets visual representation learning: A prospective study

Y Liu, YS Wei, H Yan, GB Li, L Lin - Machine Intelligence Research, 2022 - Springer
Visual representation learning is ubiquitous in various real-world applications, including
visual comprehension, video understanding, multi-modal analysis, human-computer …

Avoid-df: Audio-visual joint learning for detecting deepfake

W Yang, X Zhou, Z Chen, B Guo, Z Ba… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Recently, deepfakes have raised severe concerns about the authenticity of online media.
Prior works for deepfake detection have made many efforts to capture the intra-modal …

Semi-supervised and unsupervised deep visual learning: A survey

Y Chen, M Mancini, X Zhu… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
State-of-the-art deep learning models are often trained with a large amount of costly labeled
training data. However, requiring exhaustive manual annotations may degrade the model's …

Sound to visual scene generation by audio-to-visual latent alignment

K Sung-Bin, A Senocak, H Ha… - Proceedings of the …, 2023 - openaccess.thecvf.com
How does audio describe the world around us? In this paper, we propose a method for
generating an image of a scene from sound. Our method addresses the challenges of …

Audio-visual generalised zero-shot learning with cross-modal attention and language

OB Mercea, L Riesch, A Koepke… - Proceedings of the …, 2022 - openaccess.thecvf.com
Learning to classify video data from classes not included in the training data, ie video-based
zero-shot learning, is challenging. We conjecture that the natural alignment between the …

Structured knowledge distillation for accurate and efficient object detection

L Zhang, K Ma - IEEE Transactions on Pattern Analysis and …, 2023 - ieeexplore.ieee.org
Knowledge distillation, which aims to transfer the knowledge learned by a cumbersome
teacher model to a lightweight student model, has become one of the most popular and …

Sound-guided semantic image manipulation

SH Lee, W Roh, W Byeon, SH Yoon… - Proceedings of the …, 2022 - openaccess.thecvf.com
The recent success of the generative model shows that leveraging the multi-modal
embedding space can manipulate an image using text information. However, manipulating …

Modality-aware contrastive instance learning with self-distillation for weakly-supervised audio-visual violence detection

J Yu, J Liu, Y Cheng, R Feng, Y Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Weakly-supervised audio-visual violence detection aims to distinguish snippets containing
multimodal violence events with video-level labels. Many prior works perform audio-visual …

A general dynamic knowledge distillation method for visual analytics

Z Tu, X Liu, X **ao - IEEE Transactions on Image Processing, 2022 - ieeexplore.ieee.org
Existing knowledge distillation (KD) method normally fixes the weight of the teacher network,
and uses the knowledge from the teacher network to guide the training of the student …