How to scale your ema
Preserving training dynamics across batch sizes is an important tool for practical machine
learning as it enables the trade-off between batch size and wall-clock time. This trade-off is …
learning as it enables the trade-off between batch size and wall-clock time. This trade-off is …
Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play
Teaching dexterity to multi-fingered robots has been a longstanding challenge in robotics.
Most prominent work in this area focuses on learning controllers or policies that either …
Most prominent work in this area focuses on learning controllers or policies that either …
Masked modeling duo: Learning representations by encouraging both networks to model the input
Masked Autoencoders is a simple yet powerful self-supervised learning method. However, it
learns representations indirectly by reconstructing masked input patches. Several methods …
learns representations indirectly by reconstructing masked input patches. Several methods …
Self-supervised audio teacher-student transformer for both clip-level and frame-level tasks
Self-supervised learning (SSL) has emerged as a popular approach for learning audio
representations. One goal of audio self-supervised pre-training is to transfer knowledge to …
representations. One goal of audio self-supervised pre-training is to transfer knowledge to …
Self-supervised learning for speech enhancement through synthesis
Modern speech enhancement (SE) networks typically implement noise suppression through
time-frequency masking, latent representation masking, or discriminative signal prediction …
time-frequency masking, latent representation masking, or discriminative signal prediction …
Xkd: Cross-modal knowledge distillation with domain alignment for video representation learning
We present XKD, a novel self-supervised framework to learn meaningful representations
from unlabelled videos. XKD is trained with two pseudo objectives. First, masked data …
from unlabelled videos. XKD is trained with two pseudo objectives. First, masked data …
Self-supervised learning for anomalous sound detection
K Wilkinghoff - … 2024-2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
State-of-the-art anomalous sound detection (ASD) systems are often trained by using an
auxiliary classification task to learn an embedding space. Doing so enables the system to …
auxiliary classification task to learn an embedding space. Doing so enables the system to …
On the effect of data-augmentation on local embedding properties in the contrastive learning of music audio representations
Audio embeddings are crucial tools in understanding large catalogs of music. Typically
embeddings are evaluated on the basis of the performance they provide in a wide range of …
embeddings are evaluated on the basis of the performance they provide in a wide range of …
Self-supervised learning for few-shot bird sound classification
Self-supervised learning (SSL) in audio holds significant potential across various domains,
particularly in situations where abundant, unlabeled data is readily available at no cost. This …
particularly in situations where abundant, unlabeled data is readily available at no cost. This …
Benchmarking Representations for Speech, Music, and Acoustic Events
Limited diversity in standardized benchmarks for evaluating audio representation learning
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …
(ARL) methods may hinder systematic comparison of current methods' capabilities. We …