E-branchformer: Branchformer with enhanced merging for speech recognition
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …
global information, has shown remarkable performance and is currently regarded as the …
Comparative layer-wise analysis of self-supervised speech models
Many self-supervised speech models, varying in their pre-training objective, input modality,
and pre-training data, have been proposed in the last few years. Despite impressive …
and pre-training data, have been proposed in the last few years. Despite impressive …
Hype: Hyperbolic entailment filtering for underspecified images and texts
In an era where the volume of data drives the effectiveness of self-supervised learning, the
specificity and clarity of data semantics play a crucial role in model training. Addressing this …
specificity and clarity of data semantics play a crucial role in model training. Addressing this …
On the utility of self-supervised models for prosody-related tasks
Self-Supervised Learning (SSL) from speech data has produced models that have achieved
remarkable performance in many tasks, and that are known to implicitly represent many …
remarkable performance in many tasks, and that are known to implicitly represent many …
SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks
Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …
speech research community, but have not received as much attention as lower-level tasks …
Supporting newsrooms with journalistic knowledge graph platforms: Current state and future directions
Increasing competition and loss of revenues force newsrooms to explore new digital
solutions. The new solutions employ artificial intelligence and big data techniques such as …
solutions. The new solutions employ artificial intelligence and big data techniques such as …
Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-
decoder models for speech data. We induce a pseudo language as a compact discrete …
decoder models for speech data. We induce a pseudo language as a compact discrete …
Exploring the capability of mamba in speech applications
This paper explores the capability of Mamba, a recently proposed architecture based on
state space models (SSMs), as a competitive alternative to Transformer-based models. In …
state space models (SSMs), as a competitive alternative to Transformer-based models. In …
What do self-supervised speech models know about words?
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …
improving performance and data efficiency on various speech tasks. However, these …
Structured pruning of self-supervised pre-trained models for speech recognition and understanding
Self-supervised speech representation learning (SSL) has shown to be effective in various
downstream tasks, but SSL models are usually large and slow. Model compression …
downstream tasks, but SSL models are usually large and slow. Model compression …