Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Q Shao, P Guo, J Yan, P Hu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Accents pose significant challenges for speech recognition systems. Although joint
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …

A robust accent classification system based on variational mode decomposition

D Subhash, B Premjith, V Ravi - Engineering Applications of Artificial …, 2025 - Elsevier
State-of-the-art automatic speech recognition models often struggle to capture nuanced
features inherent in accented speech, leading to sub-optimal performance in speaker …

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

W Liu, K Fu, X Tian, S Shi, W Li, Z Ma… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recent studies on pronunciation scoring have explored the effect of introducing phone
embeddings as reference pronunciation, but mostly in an implicit manner, ie, addition or …

MBCFNet: A Multimodal Brain–Computer Fusion Network for human intention recognition

Z Li, G Zhang, S Okada, L Wang, B Zhao… - Knowledge-Based …, 2024 - Elsevier
Accurate recognition of human intent is crucial for effective human–computer speech
interaction. Numerous intent understanding studies were based on speech-to-text …

Enhanced cross-modal parallel training for improving end-to-end accented speech recognition

R Dong, J Chen, Y Long, Y Li, D Xu - Speech Communication, 2025 - Elsevier
The inherent variability in pronunciation across different accents presents a significant
challenge to accurate speech recognition, greatly impairing the performance of current end …

[PDF][PDF] Self-supervised learning representation based accent recognition with persistent accent memory

R Li, Z **e, H Xu, Y Peng, H Liu, H Huang… - Proceedings of the …, 2023 - isca-archive.org
Accent recognition (AR) is challenging due to the lack of training data as well as the accents
are entangled with speakers and regional characteristics. This paper aims to improve AR …

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

B Mu, Y Li, Q Shao, K Wei, X Wan, N Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite notable advancements in automatic speech recognition (ASR), performance tends
to degrade when faced with adverse conditions. Generative error correction (GER) …

A Multi-modal Approach to Dysarthria Detection and Severity Assessment Using Speech and Text Information

K Gurugubelli, AK Vuppala - arxiv preprint arxiv:2412.16874, 2024 - arxiv.org
Automatic detection and severity assessment of dysarthria are crucial for delivering targeted
therapeutic interventions to patients. While most existing research focuses primarily on …