Layer-wise analysis of a self-supervised speech representation model
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …
training speech representation models. The utility of these learned representations has been …
Cross-modal contrastive learning for speech translation
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …
Learning similar representations for semantically similar speech and text is important for …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …
improving performance and data efficiency on various speech tasks. However, these …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …
producing performance and data efficiency improvements for a variety of speech tasks …
Analyzing acoustic word embeddings from pre-trained self-supervised speech models
Given the strong results of self-supervised models on various tasks, there have been
surprisingly few studies exploring self-supervised representations for acoustic word …
surprisingly few studies exploring self-supervised representations for acoustic word …
Understanding shared speech-text representations
Recently, a number of approaches to train speech models by incorporating text into end-to-
end models have been developed, with Maestro advancing state-of-the-art automatic …
end models have been developed, with Maestro advancing state-of-the-art automatic …
DP-Parse: Finding word boundaries from raw speech with an instance lexicon
R Algayres, T Ricoul, J Karadayi… - Transactions of the …, 2022 - direct.mit.edu
Finding word boundaries in continuous speech is challenging as there is little or no
equivalent of a 'space'delimiter between words. Popular Bayesian non-parametric models …
equivalent of a 'space'delimiter between words. Popular Bayesian non-parametric models …
Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation
Acoustic word embeddings (AWEs) are fixed-dimensional representations of variable-length
speech segments. For zero-resource languages where labelled data is not available, one …
speech segments. For zero-resource languages where labelled data is not available, one …
[HTML][HTML] CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks
G Beguš - Neural Networks, 2021 - Elsevier
How can deep neural networks encode information that corresponds to words in human
speech into raw acoustic data? This paper proposes two neural network architectures for …
speech into raw acoustic data? This paper proposes two neural network architectures for …
Discovering phonetic inventories with crosslingual automatic speech recognition
The high cost of data acquisition makes Automatic Speech Recognition (ASR) model
training problematic for most existing languages, including languages that do not even have …
training problematic for most existing languages, including languages that do not even have …