- Academic Search

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Uložit Citovat Počet citací tohoto článku: 9 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent Advances in Discrete Speech Tokens: A Review

Y Guo, Z Li, H Wang, B Li, C Shao, H Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

The rapid advancement of speech generation technologies in the era of large language
models (LLMs) has established discrete speech tokens as a foundational paradigm for …

Uložit Citovat Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

Y **e, X Wang, Z Wang, R Fu, Z Wen, S Cao… - arxiv preprint arxiv …, 2025 - arxiv.org

Current research in audio deepfake detection is gradually transitioning from binary
classification to multi-class tasks, referred as audio deepfake source tracing task. However …

Uložit Citovat Počet citací tohoto článku: 1 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

L Della Libera, F Paissan, C Subakan… - arxiv preprint arxiv …, 2025 - arxiv.org

Large language models have revolutionized natural language processing through self-
supervised pretraining on massive datasets. Inspired by this success, researchers have …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The ICME 2025 Audio Encoder Capability Challenge

J Zhang, H Dinkel, Q Song, H Wang, Y Niu… - arxiv preprint arxiv …, 2025 - arxiv.org

This challenge aims to evaluate the capabilities of audio encoders, especially in the context
of multi-task learning and real-world applications. Participants are invited to submit pre …

Uložit Citovat Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

H Gao, H Shao, X Wang, C Qiu, Y Shen, S Cai… - arxiv preprint arxiv …, 2025 - arxiv.org

The film Her features Samantha, a sophisticated AI audio agent who is capable of
understanding both linguistic and paralinguistic information in human speech and delivering …

Uložit Citovat Související články Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset

J Du, X Chen, H Wu, L Zhang, I Lin, I Chiu… - arxiv preprint arxiv …, 2025 - arxiv.org

With the rapid advancement of codec-based speech generation (CoSG) systems, creating
fake speech that mimics an individual's identity and spreads misinformation has become …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Artificial Intelligence in Creative Industries: Advances Prior to 2025

N Anantrasirichai, F Zhang, D Bull - arxiv preprint arxiv:2501.02725, 2025 - arxiv.org

The rapid advancements in artificial intelligence (AI), particularly in generative AI and large
language models (LLMs), have profoundly impacted the creative industries by enabling …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model

M Baas, P Scholtz, A Mehta, E Dyson… - arxiv preprint arxiv …, 2025 - arxiv.org

Codec-based text-to-speech (TTS) models have shown impressive quality with zero-shot
voice cloning abilities. However, they often struggle with more expressive references or …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

DAC-JAX: A JAX Implementation of the Descript Audio Codec

D Braun - arxiv preprint arxiv:2405.11554, 2024 - arxiv.org

We present an open-source implementation of the Descript Audio Codec (DAC) using
Google's JAX ecosystem of Flax, Optax, Orbax, AUX, and CLU. Our codebase enables the …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Snac: Multi-scale neural audio codec

Wavchat: A survey of spoken dialogue models

Recent Advances in Discrete Speech Tokens: A Review

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

The ICME 2025 Audio Encoder Capability Challenge

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset

Artificial Intelligence in Creative Industries: Advances Prior to 2025

MARS6: A Small and Robust Hierarchical-Codec Text-to-Speech Model

DAC-JAX: A JAX Implementation of the Descript Audio Codec