Fast timing-conditioned latent audio diffusion

Z Evans, CJ Carr, J Taylor, SH Hawley… - Forty-first International …, 2024 - openreview.net
Generating long-form 44.1 kHz stereo audio from text prompts can be computationally
demanding. Further, most previous works do not tackle that music and sound effects …

Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

[HTML][HTML] Hybrid flexible (HyFlex) seminar delivery–A technical overview of the implementation

R Sanchez-Pizani, M Detyna, S Dance… - Building and …, 2022 - Elsevier
This paper investigates a new technology for Hybrid flexible delivery (known as HyFlex), as
implemented at King's College London. The relatively novel character of HyFlex, of mixing …

Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding

L Schönherr, K Kohls, S Zeiler, T Holz… - arxiv preprint arxiv …, 2018 - arxiv.org
Voice interfaces are becoming accepted widely as input methods for a diverse set of
devices. This development is driven by rapid improvements in automatic speech recognition …

Long-form music generation with latent diffusion

Z Evans, JD Parker, CJ Carr, Z Zukowski… - arxiv preprint arxiv …, 2024 - arxiv.org
Audio-based generative models for music have seen great strides recently, but so far have
not managed to produce full-length music tracks with coherent musical structure from text …

[HTML][HTML] A review of neural network-based emulation of guitar amplifiers

T Vanhatalo, P Legrand, M Desainte-Catherine… - Applied Sciences, 2022 - mdpi.com
Vacuum tube amplifiers present sonic characteristics frequently coveted by musicians, that
are often due to the distinct nonlinearities of their circuits, and accurately modelling such …

Insights into deep non-linear filters for improved multi-channel speech enhancement

K Tesch, T Gerkmann - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
The key advantage of using multiple microphones for speech enhancement is that spatial
filtering can be used to complement the tempo-spectral processing. In a traditional setting …

Espnet2-tts: Extending the edge of tts research

T Hayashi, R Yamamoto, T Yoshimura, P Wu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …

An open source implementation of itu-t recommendation p. 808 with validation

B Naderi, R Cutler - arxiv preprint arxiv:2005.08138, 2020 - arxiv.org
The ITU-T Recommendation P. 808 provides a crowdsourcing approach for conducting a
subjective assessment of speech quality using the Absolute Category Rating (ACR) method …

Differentiable artificial reverberation

S Lee, HS Choi, K Lee - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Artificial reverberation (AR) models play a central role in various audio applications.
Therefore, estimating the AR model parameters (ARPs) of a reference reverberation is a …