[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …

Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech

J Shi, J Tian, Y Wu, J Jung, JQ Yip… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Y Yu, J Shi, Y Wu, Y Tang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of
deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled …

Muskits-espnet: A comprehensive toolkit for singing voice synthesis in new paradigm

Y Wu, J Shi, Y Yu, Y Tang, T Qian, Y Lin, J Han… - Proceedings of the …, 2024 - dl.acm.org
This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

J Shi, X Ma, H Inaguma, A Sun, S Watanabe - arxiv preprint arxiv …, 2024 - arxiv.org
Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …

Ssdm: Scalable speech dysfluency modeling

J Lian, X Zhou, Z Ezzes, J Vonk, B Morin… - arxiv preprint arxiv …, 2024 - arxiv.org
Speech dysfluency modeling is the core module for spoken language learning, and speech
therapy. However, there are three challenges. First, current state-of-the-art solutions\cite …

SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

Y Tang, Y Wu, J Shi, Q ** - arxiv preprint arxiv:2406.08905, 2024 - arxiv.org
Discrete representation has shown advantages in speech generation tasks, wherein
discrete tokens are derived by discretizing hidden features from self-supervised learning …

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

Y Zang, J Shi, Y Zhang, R Yamamoto, J Han… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent singing voice synthesis and conversion advancements necessitate robust singing
voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to …

TokSing: Singing Voice Synthesis based on Discrete Tokens

Y Wu, J Shi, Y Tang, S Yang, Q ** - arxiv preprint arxiv:2406.08416, 2024 - arxiv.org
Recent advancements in speech synthesis witness significant benefits by leveraging
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …

How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario

SH Wang, ZC Chen, J Shi, MT Chuang, GT Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
The utilization of speech Self-Supervised Learning (SSL) models achieves impressive
performance on Automatic Speech Recognition (ASR). However, in low-resource language …