Parp: Prune, adjust and re-prune for self-supervised speech recognition
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …
Word segmentation on discovered phone units with dynamic programming and self-supervised scoring
H Kamper - IEEE/ACM Transactions on Audio, Speech, and …, 2022 - ieeexplore.ieee.org
Recent work on unsupervised speech segmentation has used self-supervised models with
phone and word segmentation modules that are trained jointly. This paper instead revisits …
phone and word segmentation modules that are trained jointly. This paper instead revisits …
Autoregressive predictive coding: A comprehensive study
We review autoregressive predictive coding (APC), an approach to learn speech
representation by predicting a future frame given the past frames. We present three different …
representation by predicting a future frame given the past frames. We present three different …
Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …
data by learning high-level representations defined in terms of low-level ones. In this paper …
On compressing sequences for self-supervised speech models
Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …
supervised models become larger. While previous approaches have primarily focused on …
Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
We identify a performance trade-off between the tasks of phoneme categorization and
phoneme and word segmentation in several self-supervised learning algorithms based on …
phoneme and word segmentation in several self-supervised learning algorithms based on …
Audio-visual neural syntax acquisition
We study phrase structure induction from visually-grounded speech. The core idea is to first
segment the speech waveform into sequences of word segments, and subsequently induce …
segment the speech waveform into sequences of word segments, and subsequently induce …
textless-lib: A library for textless spoken language processing
Textless spoken language processing research aims to extend the applicability of standard
NLP toolset onto spoken language and languages with few or no textual resources. In this …
NLP toolset onto spoken language and languages with few or no textual resources. In this …
Autoregressive co-training for learning discrete speech representations
While several self-supervised approaches for learning discrete speech representation have
been proposed, it is unclear how these seemingly similar approaches relate to each other. In …
been proposed, it is unclear how these seemingly similar approaches relate to each other. In …
Self-supervised learning with segmental masking for speech representation
Self-supervised learning has achieved remarkable success for learning speech
representations from unlabeled data. The masking strategy plays an important role in the …
representations from unlabeled data. The masking strategy plays an important role in the …