An overview on language models: Recent developments and outlook
Language modeling studies the probability distributions over strings of texts. It is one of the
most fundamental tasks in natural language processing (NLP). It has been widely used in …
most fundamental tasks in natural language processing (NLP). It has been widely used in …
Adaptation algorithms for neural network-based speech recognition: An overview
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …
recognition, considering both hybrid hidden Markov model/neural network systems and end …
Hyporadise: An open baseline for generative speech recognition with large language models
Advancements in deep neural networks have allowed automatic speech recognition (ASR)
systems to attain human parity on several publicly available clean speech datasets …
systems to attain human parity on several publicly available clean speech datasets …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Deep audio-visual speech recognition
The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …
with or without the audio. Unlike previous works that have focussed on recognising a limited …
Streaming end-to-end speech recognition for mobile devices
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …
speech, are good candidates for on-device speech recognition. E2E models, however …
Object relational graph with teacher-recommended learning for video captioning
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …
video captioning task. Existing models lack adequate visual representation due to the …
State-of-the-art speech recognition with sequence-to-sequence models
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS),
subsume the acoustic, pronunciation and language model components of a traditional …
subsume the acoustic, pronunciation and language model components of a traditional …
Sub-word level lip reading with visual attention
The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …
silent videos. Most prior works deal with the open-set visual speech recognition problem by …
ESPnet-ST: All-in-one speech translation toolkit
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …