Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights
Normalization techniques are a boon for modern deep learning. They let weights converge
more quickly with often better generalization performances. It has been argued that the …
more quickly with often better generalization performances. It has been argued that the …
Evaluation of cnn-based automatic music tagging models
Recent advances in deep learning accelerated the development of content-based automatic
music tagging systems. Music information retrieval (MIR) researchers proposed various …
music tagging systems. Music information retrieval (MIR) researchers proposed various …
Lp-musiccaps: Llm-based pseudo music captioning
Automatic music captioning, which generates natural language descriptions for given music
tracks, holds significant potential for enhancing the understanding and organization of large …
tracks, holds significant potential for enhancing the understanding and organization of large …
Solving audio inverse problems with a diffusion model
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained,
be used for solving various different audio inverse problems in a problem-agnostic setting …
be used for solving various different audio inverse problems in a problem-agnostic setting …
Semi-supervised music tagging transformer
We present Music Tagging Transformer that is trained with a semi-supervised approach. The
proposed model captures local acoustic characteristics in shallow convolutional layers, then …
proposed model captures local acoustic characteristics in shallow convolutional layers, then …
Backpropagation with biologically plausible spatiotemporal adjustment for training deep spiking neural networks
The spiking neural network (SNN) mimics the information-processing operation in the
human brain. Directly applying backpropagation to the training of the SNN still has a …
human brain. Directly applying backpropagation to the training of the SNN still has a …
Matchboxnet: 1d time-channel separable convolutional neural network architecture for speech commands recognition
We present an MatchboxNet-an end-to-end neural network for speech command
recognition. MatchboxNet is a deep residual network composed from blocks of 1D time …
recognition. MatchboxNet is a deep residual network composed from blocks of 1D time …
Modeling beats and downbeats with a time-frequency transformer
Transformer is a successful deep neural network (DNN) architecture that has shown its
versatility not only in natural language processing but also in music information retrieval …
versatility not only in natural language processing but also in music information retrieval …
An interpretable deep learning model for automatic sound classification
Deep learning models have improved cutting-edge technologies in many research areas,
but their black-box structure makes it difficult to understand their inner workings and the …
but their black-box structure makes it difficult to understand their inner workings and the …
Recommendation with generative models
Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …
learning and sampling from their statistical distributions. In recent years, these models have …