- Academic Search

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

Uložit Citovat Počet citací tohoto článku: 294 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

J Hwang, M Hira, C Chen, X Zhang, Z Ni… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …

Uložit Citovat Počet citací tohoto článku: 16 Související články Všechny verze (počet: 5)

[Free GPT-4]

[PDF] arxiv.org

Pseudo-labeling for massively multilingual speech recognition

L Lugosch, T Likhomanenko… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art
monolingual speech recognition systems. In this work, we extend pseudo-labeling to …

Uložit Citovat Počet citací tohoto článku: 32 Související články Všechny verze (počet: 3)

[Free GPT-4]

[PDF] arxiv.org

Exploration on HuBERT with multiple resolutions

J Shi, Y Tang, H Inaguma, H Gong, J Pino… - arxiv preprint arxiv …, 2023 - arxiv.org

Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in
speech processing. However, we argue that its fixed 20ms resolution for hidden …

Uložit Citovat Počet citací tohoto článku: 10 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Scaling a simple approach to zero-shot speech recognition

J Zhao, V Pratap, M Auli - arxiv preprint arxiv:2407.17852, 2024 - arxiv.org

Despite rapid progress in increasing the language coverage of automatic speech
recognition, the field is still far from covering all languages with a known writing script …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions

N San, M Bartelds, B Billings, E de Falco… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent research using pre-trained transformer models suggests that just 10 minutes of
transcribed speech may be enough to fine-tune such a model for automatic speech …

Uložit Citovat Počet citací tohoto článku: 12 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

N Rossenbach, R Schlüter, S Sakti - arxiv preprint arxiv:2407.21476, 2024 - arxiv.org

The rapid development of neural text-to-speech (TTS) systems enabled its usage in other
areas of natural language processing such as automatic speech recognition (ASR) or …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Av-cpl: Continuous pseudo-labeling for audio-visual speech recognition

A Rouditchenko, R Collobert… - arxiv preprint arxiv …, 2023 - arxiv.org

Audio-visual speech contains synchronized audio and visual information that provides cross-
modal supervision to learn representations for both automatic speech recognition (ASR) and …

Uložit Citovat Počet citací tohoto článku: 4 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

EURO: ESPnet unsupervised asr open-source toolkit

D Gao, J Shi, SP Chuang, LP Garcia… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-
to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO …

Uložit Citovat Počet citací tohoto článku: 7 Související články Všechny verze (počet: 5)

[Free GPT-4]

[PDF] arxiv.org

GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition

D Galvez, T Kaldewey - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org

While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy
in automated speech recognition (ASR) pipelines, their performance has been limited by …

Uložit Citovat Počet citací tohoto článku: 2 Související články Všechny verze (počet: 3)

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Flashlight: Enabling innovation in tools for machine learning

Scaling speech technology to 1,000+ languages

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Pseudo-labeling for massively multilingual speech recognition

Exploration on HuBERT with multiple resolutions

Scaling a simple approach to zero-shot speech recognition

Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions

On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition

Av-cpl: Continuous pseudo-labeling for audio-visual speech recognition

EURO: ESPnet unsupervised asr open-source toolkit

GPU-Accelerated Wfst Beam Search Decoder for CTC-Based Speech Recognition