[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve
Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …
extensively being harnessed across a diverse range of domains, eg, forensic science …
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech
Neural codecs have become crucial to recent speech and audio generation research. In
addition to signal compression capabilities, discrete codecs have also been found to …
addition to signal compression capabilities, discrete codecs have also been found to …
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of
deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled …
deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled …
Muskits-espnet: A comprehensive toolkit for singing voice synthesis in new paradigm
This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …
Singing Voice Synthesis (SVS) through the application of pretrained audio models in both …
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …
due to its superior compression rate of the waveform, fast convergence during training, and …
Ssdm: Scalable speech dysfluency modeling
Speech dysfluency modeling is the core module for spoken language learning, and speech
therapy. However, there are three challenges. First, current state-of-the-art solutions\cite …
therapy. However, there are three challenges. First, current state-of-the-art solutions\cite …
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
Discrete representation has shown advantages in speech generation tasks, wherein
discrete tokens are derived by discretizing hidden features from self-supervised learning …
discrete tokens are derived by discretizing hidden features from self-supervised learning …
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Recent singing voice synthesis and conversion advancements necessitate robust singing
voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to …
voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to …
TokSing: Singing Voice Synthesis based on Discrete Tokens
Recent advancements in speech synthesis witness significant benefits by leveraging
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …
discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer …
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
The utilization of speech Self-Supervised Learning (SSL) models achieves impressive
performance on Automatic Speech Recognition (ASR). However, in low-resource language …
performance on Automatic Speech Recognition (ASR). However, in low-resource language …