Parp: Prune, adjust and re-prune for self-supervised speech recognition

CIJ Lai, Y Zhang, AH Liu, S Chang… - Advances in …, 2021 - proceedings.neurips.cc
Self-supervised speech representation learning (speech SSL) has demonstrated the benefit
of scale in learning rich representations for Automatic Speech Recognition (ASR) with …

Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

H Kamper - IEEE/ACM Transactions on Audio, Speech, and …, 2022 - ieeexplore.ieee.org
Recent work on unsupervised speech segmentation has used self-supervised models with
phone and word segmentation modules that are trained jointly. This paper instead revisits …

Autoregressive predictive coding: A comprehensive study

GP Yang, SL Yeh, YA Chung, J Glass… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
We review autoregressive predictive coding (APC), an approach to learn speech
representation by predicting a future frame given the past frames. We present three different …

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

S Cuervo, A Lancucki, R Marxer… - Advances in …, 2022 - proceedings.neurips.cc
The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …

On compressing sequences for self-supervised speech models

Y Meng, HJ Chen, J Shi, S Watanabe… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

S Cuervo, M Grabias, J Chorowski… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We identify a performance trade-off between the tasks of phoneme categorization and
phoneme and word segmentation in several self-supervised learning algorithms based on …

Audio-visual neural syntax acquisition

CIJ Lai, F Shi, P Peng, Y Kim, K Gimpel… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We study phrase structure induction from visually-grounded speech. The core idea is to first
segment the speech waveform into sequences of word segments, and subsequently induce …

textless-lib: A library for textless spoken language processing

E Kharitonov, J Copet, K Lakhotia, TA Nguyen… - arxiv preprint arxiv …, 2022 - arxiv.org
Textless spoken language processing research aims to extend the applicability of standard
NLP toolset onto spoken language and languages with few or no textual resources. In this …

Autoregressive co-training for learning discrete speech representations

SL Yeh, H Tang - arxiv preprint arxiv:2203.15840, 2022 - arxiv.org
While several self-supervised approaches for learning discrete speech representation have
been proposed, it is unclear how these seemingly similar approaches relate to each other. In …

Self-supervised learning with segmental masking for speech representation

X Yue, J Lin, FR Gutierrez, H Li - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
Self-supervised learning has achieved remarkable success for learning speech
representations from unlabeled data. The masking strategy plays an important role in the …