Speech resynthesis from discrete disentangled self-supervised representations

A Polyak, Y Adi, J Copet, E Kharitonov… - arxiv preprint arxiv …, 2021 - arxiv.org
We propose using self-supervised discrete representations for the task of speech
resynthesis. To generate disentangled representation, we separately extract low-bitrate …

LPCNet: Improving neural speech synthesis through linear prediction

JM Valin, J Skoglund - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
Neural speech synthesis models have recently demonstrated the ability to synthesize high
quality speech for text-to-speech and compression applications. These new models often …

Hifi-codec: Group-residual vector quantization for high fidelity audio codec

D Yang, S Liu, R Huang, J Tian, C Weng… - arxiv preprint arxiv …, 2023 - arxiv.org
Audio codec models are widely used in audio communication as a crucial technique for
compressing audio into discrete representations. Nowadays, audio codec models are …

Audiodec: An open-source streaming high-fidelity neural audio codec

YC Wu, ID Gebru, D Marković… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
A good audio codec for live applications such as telecommunication is characterized by
three key properties:(1) compression, ie the bitrate that is required to transmit the signal …

ViSQOL v3: An open source production ready objective speech and audio metric

M Chinen, FSC Lim, J Skoglund… - … on quality of …, 2020 - ieeexplore.ieee.org
Estimation of perceptual quality in audio and speech is possible using a variety of methods.
The combined v3 release of ViSQOL and ViSQOLAudio (for speech and audio …