Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

High-fidelity audio compression with improved rvqgan

R Kumar, P Seetharaman, A Luebs… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Language models have been successfully used to model natural signals, such as
images, speech, and music. A key component of these models is a high quality neural …

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Moshi: a speech-text foundation model for real-time dialogue

A Défossez, L Mazaré, M Orsini, A Royer… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue
framework. Current systems for spoken dialogue rely on pipelines of independent …

CASE-Net: Integrating local and non-local attention operations for speech enhancement

X Xu, W Tu, Y Yang - Speech Communication, 2023 - Elsevier
Local and non-local attention operations are two ubiquitous operations in the domain of
speech enhancement (SE), and they are effective to generate more discriminative patterns …

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arxiv preprint arxiv:2203.15149, 2022 - arxiv.org
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

Icassp 2023 acoustic echo cancellation challenge

R Cutler, A Saabas, T Pärnamaa… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The ICASSP 2023 Acoustic Echo Cancellation Challenge is intended to stimulate research
in acoustic echo cancellation (AEC), which is an important area of speech enhancement and …

The VoicePrivacy 2024 Challenge Evaluation Plan

N Tomashenko, X Miao, P Champion, S Meyer… - arxiv preprint arxiv …, 2024 - arxiv.org
The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

FRCRN: Boosting feature representation using frequency recurrence for monaural speech enhancement

S Zhao, B Ma, KN Watcharasupat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED)
structure and a recurrent structure have achieved promising performance for monaural …