محقق Google

H Wu, X Chen, YC Lin, K Chang, J Du… - 2024 IEEE Spoken …, 2024‏ - ieeexplore.ieee.org‏

Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …‏

ذخیره ارجاع بیان شده در 4 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models‏

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …‏

ذخیره ارجاع بیان شده در 8 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bigcodec: Pushing the limits of low-bitrate neural speech codec‏

D **n, X Tan, S Takamichi, H Saruwatari - arxiv preprint arxiv:2409.05377, 2024‏ - arxiv.org‏

We present BigCodec, a low-bitrate neural speech codec. While recent neural speech
codecs have shown impressive progress, their performance significantly deteriorates at low …‏

ذخیره ارجاع بیان شده در 6 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audio-Language Models for Audio-Centric Tasks: A survey‏

Y Su, J Bai, Q Xu, K Xu, Y Dou - arxiv preprint arxiv:2501.15177, 2025‏ - arxiv.org‏

Audio-Language Models (ALMs), which are trained on audio-text data, focus on the
processing, understanding, and reasoning of sounds. Unlike traditional supervised learning …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models‏

W Liu, Z Guo, J Xu, Y Lv, Y Chu, Z Zhao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Building upon advancements in Large Language Models (LLMs), the field of audio
processing has seen increased interest in training audio generation tasks with discrete …‏

ذخیره ارجاع بیان شده در 1 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LLMs are one-shot URL classifiers and explainers‏

F Rashid, N Ranaweera, B Doyle, S Seneviratne - Computer Networks, 2025‏ - Elsevier‏

Malicious URL classification represents a crucial aspect of cybersecurity. Although existing
work comprises numerous machine learning and deep learning-based URL classification …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding‏

TD Nguyen, JH Kim, J Choi, S Choi, J Park… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The goal of this paper is to accelerate codec-based speech synthesis systems with minimum
sacrifice to speech quality. We propose an enhanced inference method that allows for …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset‏

J Du, X Chen, H Wu, L Zhang, I Lin, I Chiu… - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

With the rapid advancement of codec-based speech generation (CoSG) systems, creating
fake speech that mimics an individual's identity and spreads misinformation has become …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Artificial Intelligence in Creative Industries: Advances Prior to 2025‏

N Anantrasirichai, F Zhang, D Bull - arxiv preprint arxiv:2501.02725, 2025‏ - arxiv.org‏

The rapid advancements in artificial intelligence (AI), particularly in generative AI and large
language models (LLMs), have profoundly impacted the creative industries by enabling …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent Advances in Discrete Speech Tokens: A Review‏

Y Guo, Z Li, H Wang, B Li, C Shao, H Zhang… - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

The rapid advancement of speech generation technologies in the era of large language
models (LLMs) has established discrete speech tokens as a foundational paradigm for …‏

ذخیره ارجاع مقاله‌های مربوط نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Uniaudio 1.5: Large language model-driven audio codec is a few-shot audio task learner

Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models‏

Wavchat: A survey of spoken dialogue models‏

Bigcodec: Pushing the limits of low-bitrate neural speech codec‏

Audio-Language Models for Audio-Centric Tasks: A survey‏

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models‏

LLMs are one-shot URL classifiers and explainers‏

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding‏

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset‏

Artificial Intelligence in Creative Industries: Advances Prior to 2025‏

Recent Advances in Discrete Speech Tokens: A Review‏