Hubert: Self-supervised speech representation learning by masked prediction of hidden units
Self-supervised approaches for speech representation learning are challenged by three
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …
unique problems:(1) there are multiple sound units in each input utterance,(2) there is no …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
HuBERT: How much can a bad teacher benefit ASR pre-training?
Compared to vision and language applications, self-supervised pre-training approaches for
ASR are challenged by three unique problems:(1) There are multiple sound units in each …
ASR are challenged by three unique problems:(1) There are multiple sound units in each …
Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition
This paper proposes an efficient memory transformer Emformer for low latency streaming
speech recognition. In Emformer, the long-range history context is distilled into an …
speech recognition. In Emformer, the long-range history context is distilled into an …
Transformer-based acoustic modeling for hybrid speech recognition
We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech
recognition. Several modeling choices are discussed in this work, including various …
recognition. Several modeling choices are discussed in this work, including various …
Streaming transformer-based acoustic models using self-attention with augmented memory
Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and
sequence-to-sequence speech recogni-tion. However, it requires access to the full …
sequence-to-sequence speech recogni-tion. However, it requires access to the full …
Improving RNN transducer based ASR with auxiliary tasks
End-to-end automatic speech recognition (ASR) models with a single neural network have
recently demonstrated state-of-the-art results compared to conventional hybrid speech …
recently demonstrated state-of-the-art results compared to conventional hybrid speech …