- Academic Search

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Speichern Zitieren Zitiert von: 17 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arxiv preprint arxiv:2307.00162, 2023 - arxiv.org

Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Speichern Zitieren Zitiert von: 12 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] isca-archive.org

[PDF][PDF] Mixed children/adult/childrenized fine-tuning for children's asr: How to reduce age mismatch and speaking style mismatch

T Graave, Z Li, T Lohrenz, T Fingscheidt - Proc. Interspeech 2024, 2024 - isca-archive.org

Today's end-to-end (E2E) ASR models achieve strong performance when applied to adult
speech, but deteriorate on children's speech. Most E2E ASR models are pre-trained on adult …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

A Rouditchenko, Y Gong, S Thomas, L Karlinsky… - arxiv preprint arxiv …, 2024 - arxiv.org

Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in
noise. Since videos are harder to obtain than audio, the video training data of AVSR models …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

A Rouditchenko, S Bhati, S Thomas, H Kuehne… - arxiv preprint arxiv …, 2025 - arxiv.org

Audio-Visual Speech Recognition (AVSR) combines lip-based video with audio and can
improve performance in noise, but most methods are trained only on English data. One …

Speichern Zitieren Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Probing self-supervised learning models with target speech extraction

J Peng, M Delcroix, T Ochiai, O Plchot… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org

Large-scale pre-trained self-supervised learning (SSL) models have shown remarkable
advancements in speech-related tasks. However, the utilization of these models in complex …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[HTML] amazon.science

[HTML][HTML] Interleaved audio/audiovisual transfer learning for AV-ASR in low-resourced languages

Z Li, P Blumenberg, J Liu, T Graave, T Lohrenz… - 2024 - amazon.science

Cross-language transfer learning from English to a target language has shown effectiveness
in low-resourced audiovisual speech recognition (AV-ASR). We first investigate a 2-stage …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen Im Cache

[Free GPT-4]

[PDF] isca-archive.org

[PDF][PDF] Leveraging Adapter for Parameter-Efficient ASR Encoder

K Shim, J Lee, H Kim - Proc. Interspeech 2024, 2024 - isca-archive.org

The expansion of speech models emphasizes the importance of parameter efficiency in
practical automatic speech recognition (ASR) systems. Parameter sharing, which reuses the …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification

J Peng, L Mošner, L Zhang, O Plchot… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-supervised learning (SSL) models for speaker verification (SV) have gained significant
attention in recent years. However, existing SSL-based SV systems often struggle to capture …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] aclanthology.org

ConEC: Earnings call dataset with real-world contexts for benchmarking contextual speech recognition

R Huang, M Yarmohammadi, J Trmal… - Proceedings of the …, 2024 - aclanthology.org

Knowing the particular context associated with a conversation can help improving the
performance of an automatic speech recognition (ASR) system. For example, if we are …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Parameter-efficient cross-language transfer learning for a language-modular audiovisual speech...

What do self-supervised speech models know about words?

What do self-supervised speech models know about words?

[PDF][PDF] Mixed children/adult/childrenized fine-tuning for children's asr: How to reduce age mismatch and speaking style mismatch

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Probing self-supervised learning models with target speech extraction

[HTML][HTML] Interleaved audio/audiovisual transfer learning for AV-ASR in low-resourced languages

[PDF][PDF] Leveraging Adapter for Parameter-Efficient ASR Encoder

CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification

ConEC: Earnings call dataset with real-world contexts for benchmarking contextual speech recognition