Dasb-discrete audio and speech benchmark

P Mousavi, L Della Libera, J Duret, A Ploujnikov… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Discrete audio tokens have recently gained considerable attention for their potential to
connect audio and language processing, enabling the creation of modern multimodal large …

Self-supervised speech representations are more phonetic than semantic

K Choi, A Pasad, T Nakamura, S Fukayama… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Self-supervised speech models (S3Ms) have become an effective backbone for speech
applications. Various analyses suggest that S3Ms encode linguistic properties. In this work …

The Interspeech 2024 challenge on speech processing using discrete units

X Chang, J Shi, J Tian, Y Wu, Y Tang, Y Wu… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Representing speech and audio signals in discrete units has become a compelling
alternative to traditional high-dimensional feature vectors. Numerous studies have …

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024‏ - ieeexplore.ieee.org
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …

MMM: Multi-layer multi-residual multi-stream discrete speech representation from self-supervised learning model

J Shi, X Ma, H Inaguma, A Sun, S Watanabe - arxiv preprint arxiv …, 2024‏ - arxiv.org
Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …

How should we extract discrete audio tokens from self-supervised models?

P Mousavi, J Duret, S Zaiem, L Della Libera… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Discrete audio tokens have recently gained attention for their potential to bridge the gap
between audio and language processing. Ideal audio tokens must preserve content …

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding

S Shon, K Kim, YT Hsu, P Sridhar, S Watanabe… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The integration of pre-trained text-based large language models (LLM) with speech input
has enabled instruction-following capabilities for diverse speech tasks. This integration …

mhubert-147: A compact multilingual hubert model

MZ Boito, V Iyer, N Lagos, L Besacier… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We present mHuBERT-147, the first general-purpose massively multilingual HuBERT
speech representation model trained on 90K hours of clean, open-license data. To scale up …

Scaling properties of speech language models

S Cuervo, R Marxer - arxiv preprint arxiv:2404.00685, 2024‏ - arxiv.org
Speech Language Models (SLMs) aim to learn language from raw audio, without textual
resources. Despite significant advances, our current models exhibit weak syntax and …

Speechprompt: Prompting speech language models for speech processing tasks

KW Chang, H Wu, YK Wang, YK Wu… - … on Audio, Speech …, 2024‏ - ieeexplore.ieee.org
Prompting has become a practical method for utilizing pre-trained language models (LMs).
This approach offers several advantages. It allows an LM to adapt to new tasks with minimal …