Efficient training of audio transformers with patchout

K Koutini, J Schlüter, H Eghbal-Zadeh… - arxiv preprint arxiv …, 2021 - arxiv.org
The great success of transformer-based models in natural language processing (NLP) has
led to various attempts at adapting these architectures to other domains such as vision and …

Look, listen, and learn more: Design choices for deep audio embeddings

AL Cramer, HH Wu, J Salamon… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
A considerable challenge in applying deep learning to audio classification is the scarcity of
labeled data. An increasingly popular solution is to learn deep audio embeddings from large …

Music deep learning: deep learning methods for music signal processing—a review of the state-of-the-art

L Moysis, LA Iliadis, SP Sotiroudis, AD Boursianis… - Ieee …, 2023 - ieeexplore.ieee.org
The discipline of Deep Learning has been recognized for its strong computational tools,
which have been extensively used in data and signal processing, with innumerable …

Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation

I Martín-Morató, A Mesaros - IEEE/ACM transactions on audio …, 2023 - ieeexplore.ieee.org
Crowdsourcing is a popular tool for collecting large amounts of annotated data, but the
specific format of the strong labels necessary for sound event detection is not easily …

Masked spectrogram prediction for self-supervised audio pre-training

D Chong, H Wang, P Zhou… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Transformer-based models attain excellent results and generalize well when trained on
sufficient amounts of data. However, constrained by the limited data available in the audio …

Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks

K Koutini, H Eghbal-zadeh… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
In this paper, we study the performance of variants of well-known Convolutional Neural
Network (CNN) architectures on different audio tasks. We show that tuning the Receptive …

Multi-instrument automatic music transcription with self-attention-based instance segmentation

YT Wu, B Chen, L Su - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org
Multi-instrument automatic music transcription (AMT) is a critical but less investigated
problem in the field of music information retrieval (MIR). With all the difficulties faced by …

On the application of deep learning and multifractal techniques to classify emotions and instruments using Indian Classical Music

S Nag, M Basu, S Sanyal, A Banerjee… - Physica A: Statistical …, 2022 - Elsevier
Music is often considered as the language of emotions. The way it stimulates the emotional
appraisal across people from different communities, culture and demographics has long …

Training sound event detection with soft labels from crowdsourced annotations

I Martín-Morató, M Harju, P Ahokas… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this paper, we study the use of soft labels to train a system for sound event detection
(SED). Soft labels can result from annotations which account for human uncertainty about …

An attention mechanism for musical instrument recognition

S Gururani, M Sharma, A Lerch - arxiv preprint arxiv:1907.04294, 2019 - arxiv.org
While the automatic recognition of musical instruments has seen significant progress, the
task is still considered hard for music featuring multiple instruments as opposed to single …