Efficient self-supervised learning with contextualized target representations for vision, speech and language

A Baevski, A Babu, WN Hsu… - … Conference on Machine …, 2023 - proceedings.mlr.press
Current self-supervised learning algorithms are often modality-specific and require large
amounts of computational resources. To address these issues, we increase the training …

Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining

K Nowakowski, M Ptaszynski, K Murasaki… - Information Processing …, 2023 - Elsevier
In recent years, neural models learned through self-supervised pretraining on large scale
multilingual text or speech data have exhibited promising results for underresourced …

On compressing sequences for self-supervised speech models

Y Meng, HJ Chen, J Shi, S Watanabe… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …

On the (in) efficiency of acoustic feature extractors for self-supervised speech representation learning

T Parcollet, S Zhang, R van Dalen, AGCP Ramos… - Interspeech 2023, 2023 - hal.science
Speech representations learned with self-supervised learning (SSL) have the potential to
significantly improve the performance of a number of audio applications, especially when …

Efficiency-oriented approaches for self-supervised speech representation learning

L Lugo, V Vielzeuf - International Journal of Speech Technology, 2024 - Springer
Self-supervised learning enables the training of large neural models without the need for
large, labeled datasets. It has been generating breakthroughs in several fields, including …

Towards efficient self-supervised representation learning in speech processing

L Lugo, V Vielzeuf - Findings of the Association for Computational …, 2024 - aclanthology.org
Self-supervised learning has achieved impressive results in speech processing, but current
models are computationally expensive, generating environmental concerns because of their …

Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget

AT Liu, YC Lin, H Wu, S Winkler… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Despite their impressive success, training foundation models remains computationally
costly. This paper investigates how to efficiently train speech foundation models with self …

Sustainable self-supervised learning for speech representations

L Lugo, V Vielzeuf - arxiv preprint arxiv:2406.07696, 2024 - arxiv.org
Sustainable artificial intelligence focuses on data, hardware, and algorithms to make
machine learning models more environmentally responsible. In particular, machine learning …

Once-for-all sequence compression for self-supervised speech models

HJ Chen, Y Meng, H Lee - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
The sequence length along the time axis is often the dominant factor of the computation in
speech processing. Works have been proposed to reduce the sequence length for lowering …

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

S Jeon, CF Yeh, H Inan, WN Hsu… - … , Speech, and Signal …, 2024 - ieeexplore.ieee.org
In this paper, we show that a simple audio language model can achieve comparable
inference efficiency to more complicated pre-trained models with speech transformer …