How to scale your ema

D Busbridge, J Ramapuram, P Ablin… - Advances in …, 2024 - proceedings.neurips.cc
Preserving training dynamics across batch sizes is an important tool for practical machine
learning as it enables the trade-off between batch size and wall-clock time. This trade-off is …

Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play

I Guzey, B Evans, S Chintala, L Pinto - arxiv preprint arxiv:2303.12076, 2023 - arxiv.org
Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics.
Most prominent work in this area focuses on learning controllers or policies that either …

Masked modeling duo: Learning representations by encouraging both networks to model the input

D Niizumi, D Takeuchi, Y Ohishi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it
learns representations indirectly by reconstructing masked input patches. Several methods …

Self-supervised audio teacher-student transformer for both clip-level and frame-level tasks

X Li, N Shao, X Li - IEEE/ACM Transactions on Audio, Speech …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) has emerged as a popular approach for learning audio
representations. One goal of audio self-supervised pre-training is to transfer knowledge to …

Self-supervised learning for speech enhancement through synthesis

B Irvin, M Stamenovic, M Kegler… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …

Xkd: Cross-modal knowledge distillation with domain alignment for video representation learning

P Sarkar, A Etemad - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
We present XKD, a novel self-supervised framework to learn meaningful representations
from unlabelled videos. XKD is trained with two pseudo objectives. First, masked data …

Self-supervised learning for anomalous sound detection

K Wilkinghoff - … 2024-2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
State-of-the-art anomalous sound detection (ASD) systems are often trained by using an
auxiliary classification task to learn an embedding space. Doing so enables the system to …

On the effect of data-augmentation on local embedding properties in the contrastive learning of music audio representations

MC McCallum, MEP Davies, F Henkel… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Audio embeddings are crucial tools in understanding large catalogs of music. Typically
embeddings are evaluated on the basis of the performance they provide in a wide range of …

Self-supervised learning for few-shot bird sound classification

I Moummad, N Farrugia… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Self-supervised learning (SSL) in audio holds significant potential across various domains,
particularly in situations where abundant, unlabeled data is readily available at no cost. This …

Benchmarking Representations for Speech, Music, and Acoustic Events

M La Quatra, A Koudounas, L Vaiani, E Baralis… - arxiv preprint arxiv …, 2024 - arxiv.org
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …