Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …
to accelerate the research and development of audio and speech technologies by providing …
Pseudo-labeling for massively multilingual speech recognition
Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art
monolingual speech recognition systems. In this work, we extend pseudo-labeling to …
monolingual speech recognition systems. In this work, we extend pseudo-labeling to …
Exploration on HuBERT with multiple resolutions
Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in
speech processing. However, we argue that its fixed 20ms resolution for hidden …
speech processing. However, we argue that its fixed 20ms resolution for hidden …
Scaling a simple approach to zero-shot speech recognition
Despite rapid progress in increasing the language coverage of automatic speech
recognition, the field is still far from covering all languages with a known writing script …
recognition, the field is still far from covering all languages with a known writing script …
Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions
Recent research using pre-trained transformer models suggests that just 10 minutes of
transcribed speech may be enough to fine-tune such a model for automatic speech …
transcribed speech may be enough to fine-tune such a model for automatic speech …
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
The rapid development of neural text-to-speech (TTS) systems enabled its usage in other
areas of natural language processing such as automatic speech recognition (ASR) or …
areas of natural language processing such as automatic speech recognition (ASR) or …
Av-cpl: Continuous pseudo-labeling for audio-visual speech recognition
Audio-visual speech contains synchronized audio and visual information that provides cross-
modal supervision to learn representations for both automatic speech recognition (ASR) and …
modal supervision to learn representations for both automatic speech recognition (ASR) and …
EURO: ESPnet unsupervised asr open-source toolkit
This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-
to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO …
to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO …
GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition
While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy
in automated speech recognition (ASR) pipelines, their performance has been limited by …
in automated speech recognition (ASR) pipelines, their performance has been limited by …