[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
A better and faster end-to-end model for streaming asr
End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for
streaming speech recognition [1] across many dimensions, including quality (as measured …
streaming speech recognition [1] across many dimensions, including quality (as measured …
Dual-mode ASR: Unify and improve streaming ASR with full-context modeling
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as
quickly and accurately as possible, while full-context ASR waits for the completion of a full …
quickly and accurately as possible, while full-context ASR waits for the completion of a full …
Joist: A joint speech and text streaming model for asr
We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E)
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …
How does pre-trained wav2vec 2.0 perform on domain-shifted asr? an extensive benchmark on air traffic control communications
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …
Streaming end-to-end multilingual speech recognition with joint language identification
Language identification is critical for many downstream tasks in automatic speech
recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an …
recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an …
[PDF][PDF] An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling.
On-device end-to-end (E2E) models have shown improvements over a conventional model
on Search test sets in both quality, as measured by Word Error Rate (WER)[1], and latency …
on Search test sets in both quality, as measured by Word Error Rate (WER)[1], and latency …
Pseudo label is better than human label
State-of-the-art automatic speech recognition (ASR) systems are trained with tens of
thousands of hours of labeled speech data. Human transcription is expensive and time …
thousands of hours of labeled speech data. Human transcription is expensive and time …