Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Sound event detection: A tutorial

A Mesaros, T Heittola, T Virtanen… - IEEE Signal …, 2021 - ieeexplore.ieee.org
Imagine standing on a street corner in the city. With your eyes closed you can hear and
recognize a succession of sounds: cars passing by, people speaking, their footsteps when …

Wav2clip: Learning robust audio representations from clip

HH Wu, P Seetharaman, K Kumar… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We propose Wav2CLIP, a robust audio representation learning method by distilling from
Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on …

Listen, think, and understand

Y Gong, H Luo, AH Liu, L Karlinsky, J Glass - arxiv preprint arxiv …, 2023 - arxiv.org
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …

Byol for audio: Self-supervised learning for general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Inspired by the recent progress in self-supervised learning for computer vision that
generates supervision using data augmentations, we explore a new general-purpose audio …

Voice2series: Reprogramming acoustic models for time series classification

CHH Yang, YY Tsai, PY Chen - International conference on …, 2021 - proceedings.mlr.press
Learning to classify time series with limited data is a practical yet challenging problem.
Current methods are primarily based on hand-designed feature extraction rules or domain …

Masked spectrogram modeling using masked autoencoders for learning general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Evaluation of Audio …, 2022 - proceedings.mlr.press
Recent general-purpose audio representations show state-of-the-art performance on
various audio tasks. These representations are pre-trained by self-supervised learning …

Contrastive learning of musical representations

J Spijkervet, JA Burgoyne - arxiv preprint arxiv:2103.09410, 2021 - arxiv.org
While deep learning has enabled great advances in many areas of music, labeled music
datasets remain especially hard, expensive, and time-consuming to create. In this work, we …

[HTML][HTML] Detection of COVID-19 from voice, cough and breathing patterns: Dataset and preliminary results

V Despotovic, M Ismael, M Cornil, R Mc Call… - Computers in Biology …, 2021 - Elsevier
COVID-19 heavily affects breathing and voice and causes symptoms that make patients'
voices distinctive, creating recognizable audio signatures. Initial studies have already …

Stable audio open

Z Evans, JD Parker, CJ Carr, Z Zukowski… - arxiv preprint arxiv …, 2024 - arxiv.org
Open generative models are vitally important for the community, allowing for fine-tunes and
serving as baselines when presenting new models. However, most current text-to-audio …