TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch
Large Automatic Speech Recognition (ASR) models demand a vast number of parameters,
copious amounts of data, and significant computational resources during the training …
copious amounts of data, and significant computational resources during the training …
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers
This paper introduces Interleaved Speech-Text Language Model (IST-LM) for streaming
zero-shot Text-to-Speech (TTS). Unlike many previous approaches, IST-LM is directly …
zero-shot Text-to-Speech (TTS). Unlike many previous approaches, IST-LM is directly …