Deep scattering network for speech emotion recognition

P Singh, G Saha, M Sahidullah - 2021 29th European Signal …, 2021 - ieeexplore.ieee.org
This paper introduces scattering transform for speech emotion recognition (SER). Scattering
transform generates feature representations which remain stable to deformations and …

Model-Based Deep Learning for Music Information Research: Leveraging diverse knowledge sources to enhance explainability, controllability, and resource efficiency …

G Richard, V Lostanlen, YH Yang… - IEEE Signal Processing …, 2025 - ieeexplore.ieee.org
In this article, we investigate the notion of model-based deep learning in the realm of music
information research (MIR). Loosely speaking, we refer to the term model-based deep …

Wavelet scattering transform for improving generalization in low-resourced spoken language identification

S Dey, P Singh, G Saha - arxiv preprint arxiv:2310.00602, 2023 - arxiv.org
Commonly used features in spoken language identification (LID), such as mel-spectrogram
or MFCC, lose high-frequency information due to windowing. The loss further increases for …

Perceptual–neural–physical sound matching

H Han, V Lostanlen, M Lagrange - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Sound matching algorithms seek to approximate a target waveform by parametric audio
synthesis. Deep neural networks have achieved promising results in matching sustained …

Differentiable time-frequency scattering on GPU

J Muradeli, C Vahidi, C Wang, H Han… - arxiv preprint arxiv …, 2022 - arxiv.org
Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency
domain which extracts spectrotemporal modulations at various rates and scales. It offers an …

Perceptual musical similarity metric learning with graph neural networks

C Vahidi, S Singh, E Benetos, H Phan… - … IEEE Workshop on …, 2023 - ieeexplore.ieee.org
Sound retrieval for assisted music composition depends on evaluating similarity between
musical instrument sounds, which is partly influenced by playing techniques. Previous …

Explainable audio classification of playing techniques with layer-wise relevance propagation

C Wang, V Lostanlen… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Deep convolutional networks (convnets) in the time–frequency domain can learn an
accurate and fine-grained categorization of sounds. For example, in the context of music …

Learning to solve inverse problems for perceptual sound matching

H Han, V Lostanlen, M Lagrange - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Perceptual sound matching (PSM) aims to find the input parameters to a synthesizer so as to
best imitate an audio target. Deep learning for PSM optimizes a neural network to analyze …

Acoustic analysis of musical timbre of wooden aerophones

Y Gonzalez, RC Prati - Romanian Journal of Acoustics and Vibration, 2022 - rjav.sra.ro
The characterization of the musical timbre, which allows the quantitative evaluation of
audios, is still an open-ended research topic. This paper evaluates a set of dimensionless …

Mean-Field Microcanonical Gradient Descent

M Häggbom, M Karlsmark, J Andén - arxiv preprint arxiv:2403.08362, 2024 - arxiv.org
Microcanonical gradient descent is a sampling procedure for energy-based models allowing
for efficient sampling of distributions in high dimension. It works by transporting samples …