Študovňa Google

Y Chu, J Xu, X Zhou, Q Yang, S Zhang, Z Yan… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, instruction-following audio-language models have received broad attention for
audio interaction with humans. However, the absence of pre-trained audio models capable …

Uložiť Citovať Citované 245-krát Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Air-bench: Benchmarking large audio-language models via generative comprehension

Q Yang, J Xu, W Liu, Y Chu, Z Jiang, X Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, instruction-following audio-language models have received broad attention for
human-audio interaction. However, the absence of benchmarks capable of evaluating audio …

Uložiť Citovať Citované 38-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Owsm v3. 1: Better and faster open whisper-style speech models based on e-branchformer

Y Peng, J Tian, W Chen, S Arora, B Yan, Y Sudo… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent studies have highlighted the importance of fully open foundation models. The Open
Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper …

Uložiť Citovať Citované 37-krát Súvisiace články Všetky verzie 5 HTML verzia

Speechverse: A large-scale generalizable audio language model

N Das, S Dingliwal, S Ronanki, R Paturi… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have shown incredible proficiency in performing tasks that
require semantic understanding of natural language instructions. Recently, many works …

Uložiť Citovať Citované 28-krát Súvisiace články Všetky verzie 3 V pamäti

Viola: Conditional language models for speech recognition, synthesis, and translation

T Wang, L Zhou, Z Zhang, Y Wu, S Liu… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent research shows a big convergence in model architecture, training objectives, and
inference methods across various tasks for different modalities. In this paper, we propose …

Uložiť Citovať Citované 8-krát Súvisiace články Všetky verzie 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cosmic: Data efficient instruction-tuning for speech in-context learning

J Pan, J Wu, Y Gaur, S Sivasankaran, Z Chen… - arxiv preprint arxiv …, 2023 - arxiv.org

We present a cost-effective method to integrate speech into a large language model (LLM),
resulting in a Contextual Speech Model with Instruction-following/in-context-learning …

Uložiť Citovať Citované 22-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Ssdm: Scalable speech dysfluency modeling

J Lian, X Zhou, Z Ezzes, J Vonk… - Advances in neural …, 2025 - proceedings.neurips.cc

Speech dysfluency modeling is the core module for spoken language learning, and speech
therapy. However, there are three challenges. First, current state-of-the-art solutions~~\cite …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bestow: Efficient and streamable speech language model with the best of two worlds in gpt and t5

Z Chen, H Huang, O Hrinchuk… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Incorporating speech understanding capabilities into pretrained large-language models has
become a vital research direction (SpeechLLM). The previous architectures can be …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Retrieval augmented end-to-end spoken dialog models

M Wang, I Shafran, H Soltau, W Han… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We recently developed a joint speech and language model (SLM [1]) which fuses a
pretrained foundational speech model and a large language model (LLM), while preserving …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Desta: Enhancing speech language models through descriptive speech-text alignment

KH Lu, Z Chen, SW Fu, H Huang, B Ginsburg… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent speech language models (SLMs) typically incorporate pre-trained speech models to
extend the capabilities from large language models (LLMs). In this paper, we propose a …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 5 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Slm: Bridge the thin gap between speech and text foundation models

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Air-bench: Benchmarking large audio-language models via generative comprehension

Owsm v3. 1: Better and faster open whisper-style speech models based on e-branchformer

Speechverse: A large-scale generalizable audio language model

Viola: Conditional language models for speech recognition, synthesis, and translation

Cosmic: Data efficient instruction-tuning for speech in-context learning

Ssdm: Scalable speech dysfluency modeling

Bestow: Efficient and streamable speech language model with the best of two worlds in gpt and t5

Retrieval augmented end-to-end spoken dialog models

Desta: Enhancing speech language models through descriptive speech-text alignment