Self-supervised learning for videos: A survey
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
Sound event detection: A tutorial
Imagine standing on a street corner in the city. With your eyes closed you can hear and
recognize a succession of sounds: cars passing by, people speaking, their footsteps when …
recognize a succession of sounds: cars passing by, people speaking, their footsteps when …
Wav2clip: Learning robust audio representations from clip
We propose Wav2CLIP, a robust audio representation learning method by distilling from
Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on …
Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on …
Listen, think, and understand
The ability of artificial intelligence (AI) systems to perceive and comprehend audio signals is
crucial for many applications. Although significant progress has been made in this area …
crucial for many applications. Although significant progress has been made in this area …
Byol for audio: Self-supervised learning for general-purpose audio representation
Inspired by the recent progress in self-supervised learning for computer vision that
generates supervision using data augmentations, we explore a new general-purpose audio …
generates supervision using data augmentations, we explore a new general-purpose audio …
Voice2series: Reprogramming acoustic models for time series classification
Learning to classify time series with limited data is a practical yet challenging problem.
Current methods are primarily based on hand-designed feature extraction rules or domain …
Current methods are primarily based on hand-designed feature extraction rules or domain …
Masked spectrogram modeling using masked autoencoders for learning general-purpose audio representation
Recent general-purpose audio representations show state-of-the-art performance on
various audio tasks. These representations are pre-trained by self-supervised learning …
various audio tasks. These representations are pre-trained by self-supervised learning …
Contrastive learning of musical representations
While deep learning has enabled great advances in many areas of music, labeled music
datasets remain especially hard, expensive, and time-consuming to create. In this work, we …
datasets remain especially hard, expensive, and time-consuming to create. In this work, we …
[HTML][HTML] Detection of COVID-19 from voice, cough and breathing patterns: Dataset and preliminary results
V Despotovic, M Ismael, M Cornil, R Mc Call… - Computers in Biology …, 2021 - Elsevier
COVID-19 heavily affects breathing and voice and causes symptoms that make patients'
voices distinctive, creating recognizable audio signatures. Initial studies have already …
voices distinctive, creating recognizable audio signatures. Initial studies have already …
Stable audio open
Open generative models are vitally important for the community, allowing for fine-tunes and
serving as baselines when presenting new models. However, most current text-to-audio …
serving as baselines when presenting new models. However, most current text-to-audio …