Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

High fidelity neural audio compression

A Défossez, J Copet, G Synnaeve, Y Adi - arxiv preprint arxiv:2210.13438, 2022 - arxiv.org
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …

Enterprise data management: Types, sources, and real-time applications to enhance business performance-a systematic review

K Ngcobo, S Bhengu, A Mudau, B Thango… - Systematic Review …, 2024 - papers.ssrn.com
In the current digital era, Enterprise Data Management (EDM) plays a pivotal role in
enhancing business performance by ensuring efficient handling of diverse data sources and …

Voicebox: Text-guided multilingual universal speech generation at scale

M Le, A Vyas, B Shi, B Karrer, L Sari… - Advances in neural …, 2024 - proceedings.neurips.cc
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

Conditional diffusion probabilistic model for speech enhancement

YJ Lu, ZQ Wang, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arxiv preprint arxiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. The …

DNSMOS P. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors

CKA Reddy, V Gopal, R Cutler - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. We …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …