Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights

B Heo, S Chun, SJ Oh, D Han, S Yun, G Kim… - arxiv preprint arxiv …, 2020 - arxiv.org
Normalization techniques are a boon for modern deep learning. They let weights converge
more quickly with often better generalization performances. It has been argued that the …

Evaluation of cnn-based automatic music tagging models

M Won, A Ferraro, D Bogdanov, X Serra - arxiv preprint arxiv:2006.00751, 2020 - arxiv.org
Recent advances in deep learning accelerated the development of content-based automatic
music tagging systems. Music information retrieval (MIR) researchers proposed various …

Lp-musiccaps: Llm-based pseudo music captioning

SH Doh, K Choi, J Lee, J Nam - arxiv preprint arxiv:2307.16372, 2023 - arxiv.org
Automatic music captioning, which generates natural language descriptions for given music
tracks, holds significant potential for enhancing the understanding and organization of large …

Solving audio inverse problems with a diffusion model

E Moliner, J Lehtinen, V Välimäki - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained,
be used for solving various different audio inverse problems in a problem-agnostic setting …

Semi-supervised music tagging transformer

M Won, K Choi, X Serra - arxiv preprint arxiv:2111.13457, 2021 - arxiv.org
We present Music Tagging Transformer that is trained with a semi-supervised approach. The
proposed model captures local acoustic characteristics in shallow convolutional layers, then …

Backpropagation with biologically plausible spatiotemporal adjustment for training deep spiking neural networks

G Shen, D Zhao, Y Zeng - Patterns, 2022 - cell.com
The spiking neural network (SNN) mimics the information-processing operation in the
human brain. Directly applying backpropagation to the training of the SNN still has a …

Matchboxnet: 1d time-channel separable convolutional neural network architecture for speech commands recognition

S Majumdar, B Ginsburg - arxiv preprint arxiv:2004.08531, 2020 - arxiv.org
We present an MatchboxNet-an end-to-end neural network for speech command
recognition. MatchboxNet is a deep residual network composed from blocks of 1D time …

Modeling beats and downbeats with a time-frequency transformer

YN Hung, JC Wang, X Song, WT Lu… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Transformer is a successful deep neural network (DNN) architecture that has shown its
versatility not only in natural language processing but also in music information retrieval …

An interpretable deep learning model for automatic sound classification

P Zinemanas, M Rocamora, M Miron, F Font, X Serra - Electronics, 2021 - mdpi.com
Deep learning models have improved cutting-edge technologies in many research areas,
but their black-box structure makes it difficult to understand their inner workings and the …

Recommendation with generative models

Y Deldjoo, Z He, J McAuley, A Korikov… - arxiv preprint arxiv …, 2024 - arxiv.org
Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …