Efficient training of audio transformers with patchout
The great success of transformer-based models in natural language processing (NLP) has
led to various attempts at adapting these architectures to other domains such as vision and …
led to various attempts at adapting these architectures to other domains such as vision and …
Look, listen, and learn more: Design choices for deep audio embeddings
A considerable challenge in applying deep learning to audio classification is the scarcity of
labeled data. An increasingly popular solution is to learn deep audio embeddings from large …
labeled data. An increasingly popular solution is to learn deep audio embeddings from large …
Music deep learning: deep learning methods for music signal processing—a review of the state-of-the-art
The discipline of Deep Learning has been recognized for its strong computational tools,
which have been extensively used in data and signal processing, with innumerable …
which have been extensively used in data and signal processing, with innumerable …
Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation
Crowdsourcing is a popular tool for collecting large amounts of annotated data, but the
specific format of the strong labels necessary for sound event detection is not easily …
specific format of the strong labels necessary for sound event detection is not easily …
Masked spectrogram prediction for self-supervised audio pre-training
Transformer-based models attain excellent results and generalize well when trained on
sufficient amounts of data. However, constrained by the limited data available in the audio …
sufficient amounts of data. However, constrained by the limited data available in the audio …
Receptive field regularization techniques for audio classification and tagging with deep convolutional neural networks
In this paper, we study the performance of variants of well-known Convolutional Neural
Network (CNN) architectures on different audio tasks. We show that tuning the Receptive …
Network (CNN) architectures on different audio tasks. We show that tuning the Receptive …
Multi-instrument automatic music transcription with self-attention-based instance segmentation
Multi-instrument automatic music transcription (AMT) is a critical but less investigated
problem in the field of music information retrieval (MIR). With all the difficulties faced by …
problem in the field of music information retrieval (MIR). With all the difficulties faced by …
On the application of deep learning and multifractal techniques to classify emotions and instruments using Indian Classical Music
Music is often considered as the language of emotions. The way it stimulates the emotional
appraisal across people from different communities, culture and demographics has long …
appraisal across people from different communities, culture and demographics has long …
Training sound event detection with soft labels from crowdsourced annotations
In this paper, we study the use of soft labels to train a system for sound event detection
(SED). Soft labels can result from annotations which account for human uncertainty about …
(SED). Soft labels can result from annotations which account for human uncertainty about …
An attention mechanism for musical instrument recognition
While the automatic recognition of musical instruments has seen significant progress, the
task is still considered hard for music featuring multiple instruments as opposed to single …
task is still considered hard for music featuring multiple instruments as opposed to single …