TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch

X Song, C Liang, B Zhang, P Zhang, ZY Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Automatic Speech Recognition (ASR) models demand a vast number of parameters,
copious amounts of data, and significant computational resources during the training …

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Y Yang, Z Ma, S Liu, J Li, H Wang, L Meng… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces Interleaved Speech-Text Language Model (IST-LM) for streaming
zero-shot Text-to-Speech (TTS). Unlike many previous approaches, IST-LM is directly …