Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
High fidelity neural audio compression
We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural
networks. It consists in a streaming encoder-decoder architecture with quantized latent …
networks. It consists in a streaming encoder-decoder architecture with quantized latent …
Enterprise data management: Types, sources, and real-time applications to enhance business performance-a systematic review
K Ngcobo, S Bhengu, A Mudau, B Thango… - Systematic Review …, 2024 - papers.ssrn.com
In the current digital era, Enterprise Data Management (EDM) plays a pivotal role in
enhancing business performance by ensuring efficient handling of diverse data sources and …
enhancing business performance by ensuring efficient handling of diverse data sources and …
Voicebox: Text-guided multilingual universal speech generation at scale
Large-scale generative models such as GPT and DALL-E have revolutionized the research
community. These models not only generate high fidelity outputs, but are also generalists …
community. These models not only generate high fidelity outputs, but are also generalists …
SpeechBrain: A general-purpose speech toolkit
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …
research and development of neural speech processing technologies by being simple …
Conditional diffusion probabilistic model for speech enhancement
Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …
current systems still suffer from distorted and unnatural outputs. While generative models …
Metricgan+: An improved version of metricgan for speech enhancement
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …
and human auditory perception usually makes the quality of enhanced speech …
DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. The …
human perception. Perceptual objective metrics serve as a proxy for subjective scores. The …
DNSMOS P. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors
Human subjective evaluation is the" gold standard" to evaluate speech quality optimized for
human perception. Perceptual objective metrics serve as a proxy for subjective scores. We …
human perception. Perceptual objective metrics serve as a proxy for subjective scores. We …
Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …
natural language processing and computer vision. They have achieved great success in …