- Academic Search

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com

The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …

Simpan Kutip Dirujuk 7 kali Artikel terkait Cache

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …

Simpan Kutip Dirujuk 7 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Y Yu, J Shi, Y Wu, Y Tang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of
deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled …

Simpan Kutip Dirujuk 3 kali Artikel terkait 2 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Muskits-espnet: A comprehensive toolkit for singing voice synthesis in new paradigm

Y Wu, J Shi, Y Yu, Y Tang, T Qian, Y Lin, J Han… - Proceedings of the …, 2024 - dl.acm.org

This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …

Simpan Kutip Dirujuk 2 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

J Shi, X Ma, H Inaguma, A Sun, S Watanabe - arxiv preprint arxiv …, 2024 - arxiv.org

Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …

Simpan Kutip Dirujuk 7 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ssdm: Scalable speech dysfluency modeling

J Lian, X Zhou, Z Ezzes, J Vonk, B Morin… - arxiv preprint arxiv …, 2024 - arxiv.org

Speech dysfluency modeling is the core module for spoken language learning, and speech
therapy. However, there are three challenges. First, current state-of-the-art solutions\cite …

Simpan Kutip Dirujuk 2 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Y Tang, Y Wu, J Shi, Q ** - arxiv preprint arxiv:2406.08905, 2024 - arxiv.org

Discrete representation has shown advantages in speech generation tasks, wherein
discrete tokens are derived by discretizing hidden features from self-supervised learning …

Simpan Kutip Dirujuk 4 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Y Zang, J Shi, Y Zhang, R Yamamoto, J Han… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent singing voice synthesis and conversion advancements necessitate robust singing
voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to …

Simpan Kutip Dirujuk 9 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TokSing: Singing Voice Synthesis based on Discrete Tokens

Y Wu, J Shi, Y Tang, S Yang, Q ** - arxiv preprint arxiv:2406.08416, 2024 - arxiv.org

Recent advancements in speech synthesis witness significant benefits by leveraging
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …

Simpan Kutip Dirujuk 5 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

SH Wang, ZC Chen, J Shi, MT Chuang, GT Lin… - arxiv preprint arxiv …, 2024 - arxiv.org

The utilization of speech Self-Supervised Learning (SSL) models achieves impressive
performance on Automatic Speech Recognition (ASR). However, in low-resource language …

Simpan Kutip Dirujuk 1 kali Artikel terkait 2 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Multi-resolution HuBERT: Multi-resolution speech self-supervised learning with masked unit...

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Muskits-espnet: A comprehensive toolkit for singing voice synthesis in new paradigm

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

Ssdm: Scalable speech dysfluency modeling

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

TokSing: Singing Voice Synthesis based on Discrete Tokens

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario