Develo** Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

KH Lu, Z Chen, SW Fu, CHH Yang, J Balam… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities
of large language models (LLMs) by incorporating pre-trained speech models. However …

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

CY Kuan, H Lee - arxiv preprint arxiv:2410.16130, 2024‏ - arxiv.org
Recent advancements in large audio-language models (LALMs) have shown impressive
capabilities in understanding and reasoning about audio and speech information. However …

A Preliminary Exploration with GPT-4o Voice Mode

YX Lin, CK Yang, WC Chen, CA Li, C Huang… - arxiv preprint arxiv …, 2025‏ - arxiv.org
With the rise of multimodal large language models, GPT-4o stands out as a pioneering
model, driving us to evaluate its capabilities. This report assesses GPT-4o across various …