- Academic Search

C Wei, YC Wang, B Wang, CCJ Kuo - arxiv preprint arxiv:2303.05759, 2023 - arxiv.org

Language modeling studies the probability distributions over strings of texts. It is one of the
most fundamental tasks in natural language processing (NLP). It has been widely used in …

Uložit Citovat Počet citací tohoto článku: 60 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org

We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

Uložit Citovat Počet citací tohoto článku: 103 Související články Všechny verze (počet: 7)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Hyporadise: An open baseline for generative speech recognition with large language models

C Chen, Y Hu, CHH Yang… - Advances in …, 2023 - proceedings.neurips.cc

Advancements in deep neural networks have allowed automatic speech recognition (ASR)
systems to attain human parity on several publicly available clean speech datasets …

Uložit Citovat Počet citací tohoto článku: 49 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Uložit Citovat Počet citací tohoto článku: 176 Související články Všechny verze (počet: 6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep audio-visual speech recognition

T Afouras, JS Chung, A Senior… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

Uložit Citovat Počet citací tohoto článku: 964 Související články Všechny verze (počet: 15)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Streaming end-to-end speech recognition for mobile devices

Y He, TN Sainath, R Prabhavalkar… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org

End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …

Uložit Citovat Počet citací tohoto článku: 772 Související články Všechny verze (počet: 9)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Object relational graph with teacher-recommended learning for video captioning

Z Zhang, Y Shi, C Yuan, B Li, P Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com

Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …

Uložit Citovat Počet citací tohoto článku: 370 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

State-of-the-art speech recognition with sequence-to-sequence models

CC Chiu, TN Sainath, Y Wu… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS),
subsume the acoustic, pronunciation and language model components of a traditional …

Uložit Citovat Počet citací tohoto článku: 1495 Související články Všechny verze (počet: 10)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sub-word level lip reading with visual attention

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Uložit Citovat Počet citací tohoto článku: 110 Související články Všechny verze (počet: 12) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arxiv preprint arxiv …, 2020 - arxiv.org

We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …

Uložit Citovat Počet citací tohoto článku: 179 Související články Všechny verze (počet: 6) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

An analysis of incorporating an external language model into a sequence-to-sequence model

An overview on language models: Recent developments and outlook

Adaptation algorithms for neural network-based speech recognition: An overview

Hyporadise: An open baseline for generative speech recognition with large language models

End-to-end speech recognition: A survey

Deep audio-visual speech recognition

Streaming end-to-end speech recognition for mobile devices

Object relational graph with teacher-recommended learning for video captioning

State-of-the-art speech recognition with sequence-to-sequence models

Sub-word level lip reading with visual attention

ESPnet-ST: All-in-one speech translation toolkit