محقق Google

KH Lu, Z Chen, SW Fu, CHH Yang, J Balam… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Recent end-to-end speech language models (SLMs) have expanded upon the capabilities
of large language models (LLMs) by incorporating pre-trained speech models. However …‏

ذخیره ارجاع بیان شده در 1 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning‏

CY Kuan, H Lee - arxiv preprint arxiv:2410.16130, 2024‏ - arxiv.org‏

Recent advancements in large audio-language models (LALMs) have shown impressive
capabilities in understanding and reasoning about audio and speech information. However …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Preliminary Exploration with GPT-4o Voice Mode‏

YX Lin, CK Yang, WC Chen, CA Li, C Huang… - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

With the rise of multimodal large language models, GPT-4o stands out as a pioneering
model, driving us to evaluate its capabilities. This report assesses GPT-4o across various …‏

ذخیره ارجاع مقاله‌های مربوط نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Speech-Copilot: Leveraging Large Language Models for Speech Processing Via Task Decomposition,...

Develo** Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data‏

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning‏

A Preliminary Exploration with GPT-4o Voice Mode‏