[HTML][HTML] Deep learning for intelligent human–computer interaction

Z Lv, F Poiesi, Q Dong, J Lloret, H Song - Applied Sciences, 2022‏ - mdpi.com
In recent years, gesture recognition and speech recognition, as important input methods in
Human–Computer Interaction (HCI), have been widely used in the field of virtual reality. In …

Automatic speech recognition: Systematic literature review

S Alharbi, M Alrazgan, A Alrashed, T Alnomasi… - Ieee …, 2021‏ - ieeexplore.ieee.org
A huge amount of research has been done in the field of speech signal processing in recent
years. In particular, there has been increasing interest in the automatic speech recognition …

Transformer-based online CTC/attention end-to-end speech recognition architecture

H Miao, G Cheng, C Gao, P Zhang… - ICASSP 2020-2020 …, 2020‏ - ieeexplore.ieee.org
Recently, Transformer has gained success in automatic speech recognition (ASR) field.
However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online …

Open source magicdata-ramc: A rich annotated mandarin conversational (ramc) speech dataset

Z Yang, Y Chen, L Luo, R Yang, L Ye, G Cheng… - ar** continuous speech on smartphones via motion sensors
S Zhang, Y Liu, M Gowda - Proceedings of the ACM on Interactive …, 2023‏ - dl.acm.org
This paper presents iSpyU, a system that shows the feasibility of recognition of natural
speech content played on a phone during conference calls (Skype, Zoom, etc) using a …

CCE-Net: Causal Convolution Embedding Network for Streaming Automatic Speech Recognition

F Deng, Y Ming, B Lyu - International Journal of Network Dynamics and …, 2024‏ - sciltp.com
Streaming Automatic Speech Recognition (ASR) has gained significant attention across
various application scenarios, including video conferencing, live sports events, and …

[HTML][HTML] Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring

Q Li, C Zhang, PC Woodland - Speech Communication, 2023‏ - Elsevier
The traditional hybrid deep neural network (DNN)–hidden Markov model (HMM) system and
attention-based encoder–decoder (AED) model are both commonly used automatic speech …

ETEH: Unified attention-based end-to-end ASR and KWS architecture

G Cheng, H Miao, R Yang, K Deng… - IEEE/ACM Transactions …, 2022‏ - ieeexplore.ieee.org
Even though attention-based end-to-end (E2E) automatic speech recognition (ASR) models
have been yielding state-of-the-art recognition accuracy, they still fall behind many of the …

Alleviating asr long-tailed problem by decoupling the learning of representation and classification

K Deng, G Cheng, R Yang, Y Yan - IEEE/ACM Transactions on …, 2021‏ - ieeexplore.ieee.org
Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech
recognition (ASR). However, how to tackle the long-tailed data distribution problem while …

A comprehensive review of recent automatic speech summarization and keyword identification techniques

T Kumar, M Mahrishi, G Meena - Artificial Intelligence in Industrial …, 2022‏ - Springer
Speech has been the most popular form of human communication. A keyboard or a mouse,
on the other hand, is the most common way of entering data into a computer. It would be …