Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arxiv preprint arxiv:2205.02444, 2022 - arxiv.org
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arxiv preprint arxiv:2307.00162, 2023 - arxiv.org
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Analyzing acoustic word embeddings from pre-trained self-supervised speech models

R Sanabria, H Tang, S Goldwater - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Given the strong results of self-supervised models on various tasks, there have been
surprisingly few studies exploring self-supervised representations for acoustic word …

Understanding shared speech-text representations

G Wang, K Kastner, A Bapna, Z Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recently, a number of approaches to train speech models by incorporating text into end-to-
end models have been developed, with Maestro advancing state-of-the-art automatic …

DP-Parse: Finding word boundaries from raw speech with an instance lexicon

R Algayres, T Ricoul, J Karadayi… - Transactions of the …, 2022 - direct.mit.edu
Finding word boundaries in continuous speech is challenging as there is little or no
equivalent of a 'space'delimiter between words. Popular Bayesian non-parametric models …

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

C Jacobs, Y Matusevych… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length
speech segments. For zero-resource languages where labelled data is not available, one …

[HTML][HTML] CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks

G Beguš - Neural Networks, 2021 - Elsevier
How can deep neural networks encode information that corresponds to words in human
speech into raw acoustic data? This paper proposes two neural network architectures for …

Discovering phonetic inventories with crosslingual automatic speech recognition

P Żelasko, S Feng, LM Velazquez, A Abavisani… - Computer Speech & …, 2022 - Elsevier
The high cost of data acquisition makes Automatic Speech Recognition (ASR) model
training problematic for most existing languages, including languages that do not even have …