Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
Audiolm: a language modeling approach to audio generation
We introduce AudioLM, a framework for high-quality audio generation with long-term
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …
consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts …
[HTML][HTML] Unsupervised automatic speech recognition: A review
Abstract Automatic Speech Recognition (ASR) systems can be trained to achieve
remarkable performance given large amounts of manually transcribed speech, but large …
remarkable performance given large amounts of manually transcribed speech, but large …
Contentvec: An improved self-supervised speech representation by disentangling speakers
Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …
large-scale unannotated speech corpus, and then applying the learned representations to …
Moshi: a speech-text foundation model for real-time dialogue
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue
framework. Current systems for spoken dialogue rely on pipelines of independent …
framework. Current systems for spoken dialogue rely on pipelines of independent …
Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge
E Dunbar, N Hamilakis… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Recent progress in self-supervised or unsupervised machine learning has opened the
possibility of building a full speech processing system from raw audio without using any …
possibility of building a full speech processing system from raw audio without using any …
Analyzing speaker information in self-supervised models to improve zero-resource speech processing
Contrastive predictive coding (CPC) aims to learn representations of speech by
distinguishing future observations from a set of negative examples. Previous work has …
distinguishing future observations from a set of negative examples. Previous work has …
Are discrete units necessary for spoken language modeling?
Recent work in spoken language modeling shows the possibility of learning a language
unsupervisedly from raw audio without any text labels. The approach relies first on …
unsupervisedly from raw audio without any text labels. The approach relies first on …
SpeechGLUE: How well can self-supervised speech models capture linguistic knowledge?
Self-supervised learning (SSL) for speech representation has been successfully applied in
various downstream tasks, such as speech and speaker recognition. More recently, speech …
various downstream tasks, such as speech and speaker recognition. More recently, speech …
Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task
Indonesia is home to roughly 700 languages, which amounts to about ten percent of the
global total, positioning it as the second-most linguistically diverse country after Papua New …
global total, positioning it as the second-most linguistically diverse country after Papua New …