Slow-fast auditory streams for audio recognition

E Kazakos, A Nagrani, A Zisserman… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We propose a two-stream convolutional network for audio recognition, that operates on time-
frequency spectrogram inputs. Following similar success in visual recognition, we learn …

Listening to sounds of silence for speech denoising

R Xu, R Wu, Y Ishiwaka, C Vondrick… - Advances in Neural …, 2020 - proceedings.neurips.cc
We introduce a deep learning model for speech denoising, a long-standing challenge in
audio analysis arising in numerous applications. Our approach is based on a key …

Deep prior-based audio inpainting using multi-resolution harmonic convolutional neural networks

F Miotello, M Pezzoli, L Comanducci… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this manuscript, we propose a novel method to perform audio inpainting, ie, the
restoration of audio signals presenting multiple missing parts. Audio inpainting can be …

Catch-a-waveform: Learning to generate audio from a single short example

G Greshler, T Shaham… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Models for audio generation are typically trained on hours of recordings. Here, we
illustrate that capturing the essence of an audio source is typically possible from as little as a …

I'm sorry for your loss: Spectrally-based audio distances are bad at pitch

J Turian, M Henry - arxiv preprint arxiv:2012.04572, 2020 - arxiv.org
Growing research demonstrates that synthetic failure modes imply poor generalization. We
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …

Hppnet: Modeling the harmonic structure and pitch invariance in piano transcription

W Wei, P Li, Y Yu, W Li - arxiv preprint arxiv:2208.14339, 2022 - arxiv.org
While neural network models are making significant progress in piano transcription, they are
becoming more resource-consuming due to requiring larger model size and more …

Deep audio waveform prior

A Turetzky, T Michelson, Y Adi, S Peleg - arxiv preprint arxiv:2207.10441, 2022 - arxiv.org
Convolutional neural networks contain strong priors for generating natural looking images
[1]. These priors enable image denoising, super resolution, and inpainting in an …

[PDF][PDF] Wavelet networks: Scale equivariant learning from raw waveforms

DW Romero, EJ Bekkers, JM Tomczak… - arxiv preprint arxiv …, 2020 - pure.uva.nl
Inducing symmetry equivariance in deep neural architectures has resolved into improved
data efficiency and generalization. In this work, we utilize the concept of scale and …

Denoising cosine similarity: A theory-driven approach for efficient representation learning

T Nakagawa, Y Sanada, H Waida, Y Zhang, Y Wada… - Neural Networks, 2024 - Elsevier
Abstract Representation learning has been increasing its impact on the research and
practice of machine learning, since it enables to learn representations that can apply to …

Anomaly detection from a frequency perspective: M-band wavelet packet anomaly detection network

Z Shang, Z Zhao, S Wang, R Yan - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Anomaly detection is a task of identifying samples that significantly differ from the majority.
However, most typical anomaly detection methods often prioritize accuracy over …