Google Academic

H Ahlawat, N Aggarwal, D Gupta - International Journal of Cognitive …, 2025 - Elsevier

Significant research has been conducted during the last decade on the application of
machine learning for speech processing, particularly speech recognition. However, in recent …

Salvați Citați Citat de 1 ori Articole cu conținut similar

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Samu-xlsr: Semantically-aligned multimodal utterance-level cross-lingual speech representation

S Khurana, A Laurent, J Glass - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

We propose the (): S emantically-A ligned M ultimodal U tterance-level Cross-L ingual S
peech R epresentation learning framework. Unlike previous works on speech representation …

Salvați Citați Citat de 43 ori Articole cu conținut similar Toate cele 6 versiuni

Prosody is not identity: A speaker anonymization approach using prosody cloning

S Meyer, F Lux, J Koch, P Denisov… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Prosody is closely linked to the identity of a speaker, leading to individual pitch and
intonation patterns. Therefore, it is challenging in speaker anonymization to generate …

Salvați Citați Citat de 31 ori Articole cu conținut similar

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Anonymizing speech with generative adversarial networks to preserve speaker privacy

S Meyer, P Tilli, P Denisov, F Lux… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

In order to protect the privacy of speech data, speaker anonymization aims for hiding the
identity of a speaker by changing the voice in speech recordings. This typically comes with a …

Salvați Citați Citat de 34 ori Articole cu conținut similar Toate cele 4 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

A comparative study on non-autoregressive modelings for speech-to-text generation

Y Higuchi, N Chen, Y Fujita, H Inaguma… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence,
which significantly reduces the inference speed at the cost of accuracy drop compared to …

Salvați Citați Citat de 49 ori Articole cu conținut similar Toate cele 8 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Investigating self-supervised pretraining frameworks for pathological speech recognition

LP Violeta, WC Huang, T Toda - arxiv preprint arxiv:2203.15431, 2022 - arxiv.org

We investigate the performance of self-supervised pretraining frameworks on pathological
speech datasets used for automatic speech recognition (ASR). Modern end-to-end models …

Salvați Citați Citat de 38 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Speaker anonymization with phonetic intermediate representations

S Meyer, F Lux, P Denisov, J Koch, P Tilli… - arxiv preprint arxiv …, 2022 - arxiv.org

In this work, we propose a speaker anonymization pipeline that leverages high quality
automatic speech recognition and synthesis systems to generate speech conditioned on …

Salvați Citați Citat de 28 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [HTML] mdpi.com

[HTML][HTML] Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech

M Algabri, H Mathkour, M Alsulaiman, MA Bencherif - Mathematics, 2022 - mdpi.com

A high-performance versatile computer-assisted pronunciation training (CAPT) system that
provides the learner immediate feedback as to whether their pronunciation is correct is very …

Salvați Citați Citat de 25 ori Articole cu conținut similar Toate cele 7 versiuni În cache

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Improving noise robustness of contrastive speech representation learning with speech reconstruction

H Wang, Y Qian, X Wang, Y Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Noise robustness is essential for deploying automatic speech recognition (ASR) systems in
real-world environments. One way to reduce the effect of noise interference is to employ a …

Salvați Citați Citat de 29 ori Articole cu conținut similar Toate cele 3 versiuni

Cyclic transfer learning for mandarin-english code-switching speech recognition

CH Nga, DQ Vu, HH Luong, CL Huang… - IEEE Signal …, 2023 - ieeexplore.ieee.org

Transfer learning is a common method to improve the performance of the model on a target
task via pre-training the model on pretext tasks. Different from the methods using …

Salvați Citați Citat de 10 ori Articole cu conținut similar Toate cele 3 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

The 2020 espnet update: new features, broadened applications, performance improvements, and...

[HTML][HTML] Automatic Speech Recognition: A survey of deep learning techniques and approaches

Samu-xlsr: Semantically-aligned multimodal utterance-level cross-lingual speech representation

Prosody is not identity: A speaker anonymization approach using prosody cloning

Anonymizing speech with generative adversarial networks to preserve speaker privacy

A comparative study on non-autoregressive modelings for speech-to-text generation

Investigating self-supervised pretraining frameworks for pathological speech recognition

Speaker anonymization with phonetic intermediate representations

[HTML][HTML] Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech

Improving noise robustness of contrastive speech representation learning with speech reconstruction

Cyclic transfer learning for mandarin-english code-switching speech recognition