[HTML][HTML] Automatic Speech Recognition: A survey of deep learning techniques and approaches
H Ahlawat, N Aggarwal, D Gupta - International Journal of Cognitive …, 2025 - Elsevier
Significant research has been conducted during the last decade on the application of
machine learning for speech processing, particularly speech recognition. However, in recent …
machine learning for speech processing, particularly speech recognition. However, in recent …
Samu-xlsr: Semantically-aligned multimodal utterance-level cross-lingual speech representation
We propose the (): S emantically-A ligned M ultimodal U tterance-level Cross-L ingual S
peech R epresentation learning framework. Unlike previous works on speech representation …
peech R epresentation learning framework. Unlike previous works on speech representation …
Prosody is not identity: A speaker anonymization approach using prosody cloning
Prosody is closely linked to the identity of a speaker, leading to individual pitch and
intonation patterns. Therefore, it is challenging in speaker anonymization to generate …
intonation patterns. Therefore, it is challenging in speaker anonymization to generate …
Anonymizing speech with generative adversarial networks to preserve speaker privacy
In order to protect the privacy of speech data, speaker anonymization aims for hiding the
identity of a speaker by changing the voice in speech recordings. This typically comes with a …
identity of a speaker by changing the voice in speech recordings. This typically comes with a …
A comparative study on non-autoregressive modelings for speech-to-text generation
Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence,
which significantly reduces the inference speed at the cost of accuracy drop compared to …
which significantly reduces the inference speed at the cost of accuracy drop compared to …
Investigating self-supervised pretraining frameworks for pathological speech recognition
We investigate the performance of self-supervised pretraining frameworks on pathological
speech datasets used for automatic speech recognition (ASR). Modern end-to-end models …
speech datasets used for automatic speech recognition (ASR). Modern end-to-end models …
Speaker anonymization with phonetic intermediate representations
In this work, we propose a speaker anonymization pipeline that leverages high quality
automatic speech recognition and synthesis systems to generate speech conditioned on …
automatic speech recognition and synthesis systems to generate speech conditioned on …
[HTML][HTML] Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech
M Algabri, H Mathkour, M Alsulaiman, MA Bencherif - Mathematics, 2022 - mdpi.com
A high-performance versatile computer-assisted pronunciation training (CAPT) system that
provides the learner immediate feedback as to whether their pronunciation is correct is very …
provides the learner immediate feedback as to whether their pronunciation is correct is very …
Improving noise robustness of contrastive speech representation learning with speech reconstruction
Noise robustness is essential for deploying automatic speech recognition (ASR) systems in
real-world environments. One way to reduce the effect of noise interference is to employ a …
real-world environments. One way to reduce the effect of noise interference is to employ a …
Cyclic transfer learning for mandarin-english code-switching speech recognition
Transfer learning is a common method to improve the performance of the model on a target
task via pre-training the model on pretext tasks. Different from the methods using …
task via pre-training the model on pretext tasks. Different from the methods using …