A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Contentvec: An improved self-supervised speech representation by disentangling speakers

K Qian, Y Zhang, H Gao, J Ni, CI Lai… - International …, 2022 - proceedings.mlr.press
Self-supervised learning in speech involves training a speech representation network on a
large-scale unannotated speech corpus, and then applying the learned representations to …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

[HTML][HTML] A review of synthetic image data and its use in computer vision

K Man, J Chahl - Journal of Imaging, 2022 - mdpi.com
Development of computer vision algorithms using convolutional neural networks and deep
learning has necessitated ever greater amounts of annotated and labelled data to produce …

Faulty rolling bearing digital twin model and its application in fault diagnosis with imbalanced samples

Y Qin, H Liu, Y Mao - Advanced Engineering Informatics, 2024 - Elsevier
The simulation signals generated by the bearing dynamics model have a big gap with the
actual signals, which limits their efficacy in bearing fault diagnosis. Therefore, it is valuable …

An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Again-vc: A one-shot voice conversion using activation guidance and adaptive instance normalization

YH Chen, DY Wu, TH Wu, H Lee - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, voice conversion (VC) has been widely studied. Many VC systems use
disentangle-based learning techniques to separate the speaker and the linguistic content …

Data augmentation for deep neural networks model in EEG classification task: a review

C He, J Liu, Y Zhu, W Du - Frontiers in Human Neuroscience, 2021 - frontiersin.org
Classification of electroencephalogram (EEG) is a key approach to measure the rhythmic
oscillations of neural activity, which is one of the core technologies of brain-computer …