Multimodal intelligence: Representation learning, information fusion, and applications
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …
natural language processing since 2010. Each of these tasks involves a single modality in …
Transfer learning for speech and language processing
Transfer learning is a vital technique that generalizes models trained for one setting or task
to other settings or tasks. For example in speech recognition, an acoustic model trained for …
to other settings or tasks. For example in speech recognition, an acoustic model trained for …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
A survey of deep learning and its applications: a new paradigm to machine learning
Nowadays, deep learning is a current and a stimulating field of machine learning. Deep
learning is the most effective, supervised, time and cost efficient machine learning approach …
learning is the most effective, supervised, time and cost efficient machine learning approach …
Neural voice cloning with a few samples
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a
neural voice cloning system that learns to synthesize a person's voice from only a few audio …
neural voice cloning system that learns to synthesize a person's voice from only a few audio …
Char2wav: End-to-end speech synthesis
We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two
components: a reader and a neural vocoder. The reader is an encoder-decoder model with …
components: a reader and a neural vocoder. The reader is an encoder-decoder model with …
Deep voice 2: Multi-speaker neural text-to-speech
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional
trainable speaker embeddings to generate different voices from a single model. As a starting …
trainable speaker embeddings to generate different voices from a single model. As a starting …
Speech enhancement using self-adaptation and multi-head self-attention
This paper investigates a self-adaptation method for speech enhancement using auxiliary
speaker-aware features; we extract a speaker representation used for adaptation directly …
speaker-aware features; we extract a speaker representation used for adaptation directly …
Deep voice 2: Multi-speaker neural text-to-speech
We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional
trainable speaker embeddings to generate different voices from a single model. As a starting …
trainable speaker embeddings to generate different voices from a single model. As a starting …
Silent speech interfaces for speech restoration: A review
This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-
acoustic biosignals generated by the human body during speech production to enable …
acoustic biosignals generated by the human body during speech production to enable …