Cwcl: Cross-modal transfer with continuously weighted contrastive loss
This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-
trained model in one modality is used for representation learning in another domain using …
trained model in one modality is used for representation learning in another domain using …
EAT: Self-supervised pre-training with efficient audio transformer
Audio self-supervised learning (SSL) pre-training, which aims to learn good representations
from unlabeled audio, has made remarkable progress. However, the extensive …
from unlabeled audio, has made remarkable progress. However, the extensive …
Towards open respiratory acoustic foundation models: Pretraining and benchmarking
Respiratory audio, such as coughing and breathing sounds, has predictive power for a wide
range of healthcare applications, yet is currently under-explored. The main problem for …
range of healthcare applications, yet is currently under-explored. The main problem for …
Self-supervised audio teacher-student transformer for both clip-level and frame-level tasks
Self-supervised learning (SSL) has emerged as a popular approach for learning audio
representations. One goal of audio self-supervised pre-training is to transfer knowledge to …
representations. One goal of audio self-supervised pre-training is to transfer knowledge to …
Saic: Integration of speech anonymization and identity classification
Speech anonymization and de-identification have garnered significant attention recently,
especially in the healthcare area including telehealth consultations, patient voiceprint …
especially in the healthcare area including telehealth consultations, patient voiceprint …
Perceptual musical features for interpretable audio tagging
In the age of music streaming platforms, the task of automatically tagging music audio has
garnered significant attention, driving researchers to devise methods aimed at enhancing …
garnered significant attention, driving researchers to devise methods aimed at enhancing …
Audio-Language Models for Audio-Centric Tasks: A survey
Audio-Language Models (ALMs), which are trained on audio-text data, focus on the
processing, understanding, and reasoning of sounds. Unlike traditional supervised learning …
processing, understanding, and reasoning of sounds. Unlike traditional supervised learning …
Masked modeling duo for speech: Specializing general-purpose audio representation to speech using denoising distillation
Self-supervised learning general-purpose audio representations have demonstrated high
performance in a variety of tasks. Although they can be optimized for application by fine …
performance in a variety of tasks. Although they can be optimized for application by fine …
Mdrt: Multi-domain synthetic speech localization
With recent advancements in generating synthetic speech, tools to generate high-quality
synthetic speech impersonating any human speaker are easily available. Several incidents …
synthetic speech impersonating any human speaker are easily available. Several incidents …
Synthax: A fast modular synthesizer in jax
Modern audio production relies heavily on realtime audio synthesis. However, accelerating
audio synthesis far beyond realtime speeds has a significant role to play in advancing …
audio synthesis far beyond realtime speeds has a significant role to play in advancing …