Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

Transfer learning for speech and language processing

D Wang, TF Zheng - 2015 Asia-Pacific Signal and Information …, 2015 - ieeexplore.ieee.org
Transfer learning is a vital technique that generalizes models trained for one setting or task
to other settings or tasks. For example in speech recognition, an acoustic model trained for …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

A survey of deep learning and its applications: a new paradigm to machine learning

S Dargan, M Kumar, MR Ayyagari, G Kumar - Archives of Computational …, 2020 - Springer
Nowadays, deep learning is a current and a stimulating field of machine learning. Deep
learning is the most effective, supervised, time and cost efficient machine learning approach …

Neural voice cloning with a few samples

S Arik, J Chen, K Peng, W **… - Advances in neural …, 2018 - proceedings.neurips.cc
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a
neural voice cloning system that learns to synthesize a person's voice from only a few audio …

Char2wav: End-to-end speech synthesis

J Sotelo, S Mehri, K Kumar, JF Santos, K Kastner… - 2017 - openreview.net
We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two
components: a reader and a neural vocoder. The reader is an encoder-decoder model with …

Deep voice 2: Multi-speaker neural text-to-speech

A Gibiansky, S Arik, G Diamos, J Miller… - Advances in neural …, 2017 - proceedings.neurips.cc
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional
trainable speaker embeddings to generate different voices from a single model. As a starting …

Speech enhancement using self-adaptation and multi-head self-attention

Y Koizumi, K Yatabe, M Delcroix… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper investigates a self-adaptation method for speech enhancement using auxiliary
speaker-aware features; we extract a speaker representation used for adaptation directly …

Deep voice 2: Multi-speaker neural text-to-speech

S Arik, G Diamos, A Gibiansky, J Miller, K Peng… - arxiv preprint arxiv …, 2017 - arxiv.org
We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional
trainable speaker embeddings to generate different voices from a single model. As a starting …

Silent speech interfaces for speech restoration: A review

JA Gonzalez-Lopez, A Gomez-Alanis… - IEEE …, 2020 - ieeexplore.ieee.org
This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-
acoustic biosignals generated by the human body during speech production to enable …