Llama-omni: Seamless speech interaction with large language models

Q Fang, S Guo, Y Zhou, Z Ma, S Zhang… - ar** the overall human perceptual
experience. While prevailing large language models (LLMs) and visual language models …

BLSP-KD: Bootstrap** Language-Speech Pre-training via Knowledge Distillation

C Wang, M Liao, Z Huang, J Zhang - arxiv preprint arxiv:2405.19041, 2024 - arxiv.org
Recent end-to-end approaches have shown promise in extending large language models
(LLMs) to speech inputs, but face limitations in directly assessing and optimizing alignment …