E-branchformer: Branchformer with enhanced merging for speech recognition

K Kim, F Wu, Y Peng, J Pan, P Sridhar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …

Comparative layer-wise analysis of self-supervised speech models

A Pasad, B Shi, K Livescu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Many self-supervised speech models, varying in their pre-training objective, input modality,
and pre-training data, have been proposed in the last few years. Despite impressive …

Hype: Hyperbolic entailment filtering for underspecified images and texts

W Kim, S Chun, T Kim, D Han, S Yun - European Conference on Computer …, 2024 - Springer
In an era where the volume of data drives the effectiveness of self-supervised learning, the
specificity and clarity of data semantics play a crucial role in model training. Addressing this …

On the utility of self-supervised models for prosody-related tasks

GT Lin, CL Feng, WP Huang, Y Tseng… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Self-Supervised Learning (SSL) from speech data has produced models that have achieved
remarkable performance in many tasks, and that are known to implicitly represent many …

SLUE phase-2: A benchmark suite of diverse spoken language understanding tasks

S Shon, S Arora, CJ Lin, A Pasad, F Wu… - arxiv preprint arxiv …, 2022 - arxiv.org
Spoken language understanding (SLU) tasks have been studied for many decades in the
speech research community, but have not received as much attention as lower-level tasks …

Supporting newsrooms with journalistic knowledge graph platforms: Current state and future directions

M Gallofré Ocaña, AL Opdahl - Technologies, 2022 - mdpi.com
Increasing competition and loss of revenues force newsrooms to explore new digital
solutions. The new solutions employ artificial intelligence and big data techniques such as …

Wav2seq: Pre-training speech-to-text encoder-decoder models using pseudo languages

F Wu, K Kim, S Watanabe, KJ Han… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-
decoder models for speech data. We induce a pseudo language as a compact discrete …

Exploring the capability of mamba in speech applications

K Miyazaki, Y Masuyama, M Murata - arxiv preprint arxiv:2406.16808, 2024 - arxiv.org
This paper explores the capability of Mamba, a recently proposed architecture based on
state space models (SSMs), as a competitive alternative to Transformer-based models. In …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Structured pruning of self-supervised pre-trained models for speech recognition and understanding

Y Peng, K Kim, F Wu, P Sridhar… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Self-supervised speech representation learning (SSL) has shown to be effective in various
downstream tasks, but SSL models are usually large and slow. Model compression …