Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models

H Wu, X Chen, YC Lin, K Chang, J Du… - 2024 IEEE Spoken …, 2024‏ - ieeexplore.ieee.org
Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Bigcodec: Pushing the limits of low-bitrate neural speech codec

D **n, X Tan, S Takamichi, H Saruwatari - arxiv preprint arxiv:2409.05377, 2024‏ - arxiv.org
We present BigCodec, a low-bitrate neural speech codec. While recent neural speech
codecs have shown impressive progress, their performance significantly deteriorates at low …

Audio-Language Models for Audio-Centric Tasks: A survey

Y Su, J Bai, Q Xu, K Xu, Y Dou - arxiv preprint arxiv:2501.15177, 2025‏ - arxiv.org
Audio-Language Models (ALMs), which are trained on audio-text data, focus on the
processing, understanding, and reasoning of sounds. Unlike traditional supervised learning …

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

W Liu, Z Guo, J Xu, Y Lv, Y Chu, Z Zhao… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Building upon advancements in Large Language Models (LLMs), the field of audio
processing has seen increased interest in training audio generation tasks with discrete …

LLMs are one-shot URL classifiers and explainers

F Rashid, N Ranaweera, B Doyle, S Seneviratne - Computer Networks, 2025‏ - Elsevier
Malicious URL classification represents a crucial aspect of cybersecurity. Although existing
work comprises numerous machine learning and deep learning-based URL classification …

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding

TD Nguyen, JH Kim, J Choi, S Choi, J Park… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The goal of this paper is to accelerate codec-based speech synthesis systems with minimum
sacrifice to speech quality. We propose an enhanced inference method that allows for …

CodecFake-Omni: A Large-Scale Codec-based Deepfake Speech Dataset

J Du, X Chen, H Wu, L Zhang, I Lin, I Chiu… - arxiv preprint arxiv …, 2025‏ - arxiv.org
With the rapid advancement of codec-based speech generation (CoSG) systems, creating
fake speech that mimics an individual's identity and spreads misinformation has become …

Artificial Intelligence in Creative Industries: Advances Prior to 2025

N Anantrasirichai, F Zhang, D Bull - arxiv preprint arxiv:2501.02725, 2025‏ - arxiv.org
The rapid advancements in artificial intelligence (AI), particularly in generative AI and large
language models (LLMs), have profoundly impacted the creative industries by enabling …

Recent Advances in Discrete Speech Tokens: A Review

Y Guo, Z Li, H Wang, B Li, C Shao, H Zhang… - arxiv preprint arxiv …, 2025‏ - arxiv.org
The rapid advancement of speech generation technologies in the era of large language
models (LLMs) has established discrete speech tokens as a foundational paradigm for …