Google Tudós

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …

Mentés Hivatkozás Idézetek száma: 7 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Preference tuning with human feedback on language, speech, and vision tasks: A survey

GI Winata, H Zhao, A Das, W Tang, DD Yao… - arxiv preprint arxiv …, 2024 - arxiv.org

Preference tuning is a crucial process for aligning deep generative models with human
preferences. This survey offers a thorough overview of recent advancements in preference …

Mentés Hivatkozás Idézetek száma: 7 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

URGENT challenge: Universality, robustness, and generalizability for speech enhancement

W Zhang, R Scheibler, K Saijo, S Cornell, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org

The last decade has witnessed significant advancements in deep learning-based speech
enhancement (SE). However, most existing SE research has limitations on the coverage of …

Mentés Hivatkozás Idézetek száma: 3 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mos-bench: Benchmarking generalization abilities of subjective speech quality assessment models

WC Huang, E Cooper, T Toda - arxiv preprint arxiv:2411.03715, 2024 - arxiv.org

Subjective speech quality assessment (SSQA) is critical for evaluating speech samples as
perceived by human listeners. While model-based SSQA has enjoyed great success thanks …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

J Shi, H Shim, J Tian, S Arora, H Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for
various speech, audio, and music signals. The toolkit features a Pythonic interface with …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

S Wang, W Yu, Y Yang, C Tang, Y Li, J Zhuang… - arxiv preprint arxiv …, 2024 - arxiv.org

Speech quality assessment typically requires evaluating audio from multiple aspects, such
as mean opinion score (MOS) and speaker similarity (SIM) etc., which can be challenging to …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Massively Multilingual Forced Aligner Leveraging Self-Supervised Discrete Units

H Inaguma, I Kulikov, Z Ni, S Popuri… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

We propose a massively multilingual speech-to-text neural forced aligner that supports 98
languages with a single architecture. The aligner takes self-supervised discrete acoustic …

Mentés Hivatkozás Kapcsolódó cikkek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep Speech Synthesis from Multimodal Articulatory Representations

P Wu, B Yu, K Scheck, AW Black… - arxiv preprint arxiv …, 2024 - arxiv.org

The amount of articulatory data available for training deep learning models is much less
compared to acoustic speech data. In order to improve articulatory-to-acoustic synthesis …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

J Zhang, J Yang, Z Fang, Y Wang, Z Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

We introduce AnyEnhance, a unified generative model for voice enhancement that
processes both speech and singing voices. Based on a masked generative model …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting

MJI Basher, M Kowsher, MS Islam, RN Nandi… - arxiv preprint arxiv …, 2025 - arxiv.org

This paper introduces BnTTS (Bangla Text-To-Speech), the first framework for Bangla
speaker adaptation-based TTS, designed to bridge the gap in Bangla speech synthesis …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

SpeechBERTScore: Reference-aware automatic evaluation of speech generation leveraging nlp...

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

Preference tuning with human feedback on language, speech, and vision tasks: A survey

URGENT challenge: Universality, robustness, and generalizability for speech enhancement

Mos-bench: Benchmarking generalization abilities of subjective speech quality assessment models

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

Massively Multilingual Forced Aligner Leveraging Self-Supervised Discrete Units

Deep Speech Synthesis from Multimodal Articulatory Representations

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

BnTTS: Few-Shot Speaker Adaptation in Low-Resource Setting