Efficient self-supervised learning with contextualized target representations for vision, speech and language
Current self-supervised learning algorithms are often modality-specific and require large
amounts of computational resources. To address these issues, we increase the training …
amounts of computational resources. To address these issues, we increase the training …
Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining
In recent years, neural models learned through self-supervised pretraining on large scale
multilingual text or speech data have exhibited promising results for underresourced …
multilingual text or speech data have exhibited promising results for underresourced …
On compressing sequences for self-supervised speech models
Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …
supervised models become larger. While previous approaches have primarily focused on …
On the (in) efficiency of acoustic feature extractors for self-supervised speech representation learning
Speech representations learned with self-supervised learning (SSL) have the potential to
significantly improve the performance of a number of audio applications, especially when …
significantly improve the performance of a number of audio applications, especially when …
Efficiency-oriented approaches for self-supervised speech representation learning
Self-supervised learning enables the training of large neural models without the need for
large, labeled datasets. It has been generating breakthroughs in several fields, including …
large, labeled datasets. It has been generating breakthroughs in several fields, including …
Towards efficient self-supervised representation learning in speech processing
Self-supervised learning has achieved impressive results in speech processing, but current
models are computationally expensive, generating environmental concerns because of their …
models are computationally expensive, generating environmental concerns because of their …
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
Despite their impressive success, training foundation models remains computationally
costly. This paper investigates how to efficiently train speech foundation models with self …
costly. This paper investigates how to efficiently train speech foundation models with self …
Sustainable self-supervised learning for speech representations
Sustainable artificial intelligence focuses on data, hardware, and algorithms to make
machine learning models more environmentally responsible. In particular, machine learning …
machine learning models more environmentally responsible. In particular, machine learning …
Once-for-all sequence compression for self-supervised speech models
The sequence length along the time axis is often the dominant factor of the computation in
speech processing. Works have been proposed to reduce the sequence length for lowering …
speech processing. Works have been proposed to reduce the sequence length for lowering …
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency
In this paper, we show that a simple audio language model can achieve comparable
inference efficiency to more complicated pre-trained models with speech transformer …
inference efficiency to more complicated pre-trained models with speech transformer …