- Academic Search

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Zapisz Cytuj Cytowane przez 4 Powiązane artykuły Wszystkie wersje 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …

Zapisz Cytuj Cytowane przez 7 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Y Yu, J Shi, Y Wu, Y Tang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of
deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled …

Zapisz Cytuj Cytowane przez 3 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Preference Alignment Improves Language Model-Based TTS

J Tian, C Zhang, J Shi, H Zhang, J Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in text-to-speech (TTS) have shown that language model (LM)-based
systems offer competitive performance to their counterparts. Further optimization can be …

Zapisz Cytuj Cytowane przez 3 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

J Shi, H Shim, J Tian, S Arora, H Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for
various speech, audio, and music signals. The toolkit features a Pythonic interface with …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent Advances in Discrete Speech Tokens: A Review

Y Guo, Z Li, H Wang, B Li, C Shao, H Zhang… - arxiv preprint arxiv …, 2025 - arxiv.org

The rapid advancement of speech generation technologies in the era of large language
models (LLMs) has established discrete speech tokens as a foundational paradigm for …

Zapisz Cytuj Powiązane artykuły Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Y Guo, Z Li, J Li, C Du, H Wang, S Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice
conversion (VC). We use discrete tokens from speech self-supervised models as the content …

Zapisz Cytuj Cytowane przez 2 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings

S Horiguchi, A Ando, T Moriya… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

This paper proposes a method for extracting speaker embedding for each speaker from a
variable-length recording containing multiple speakers. Speaker embeddings are crucial not …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Y Guo, Z Li, C Du, H Wang, X Chen, K Yu - arxiv preprint arxiv …, 2024 - arxiv.org

Although discrete speech tokens have exhibited strong potential for language model-based
speech generation, their high bitrates and redundant timbre information restrict the …

Zapisz Cytuj Cytowane przez 1 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration

M Someki, K Choi, S Arora, W Chen… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit
ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on …

Zapisz Cytuj Powiązane artykuły Wszystkie wersje 3

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised...

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Preference Alignment Improves Language Model-Based TTS

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Recent Advances in Discrete Speech Tokens: A Review

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

Recursive Attentive Pooling For Extracting Speaker Embeddings From Multi-Speaker Recordings

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration