Speech technology for healthcare: Opportunities, challenges, and state of the art
Speech technology is not appropriately explored even though modern advances in speech
technology—especially those driven by deep learning (DL) technology—offer …
technology—especially those driven by deep learning (DL) technology—offer …
A survey on voice assistant security: Attacks and countermeasures
Voice assistants (VA) have become prevalent on a wide range of personal devices such as
smartphones and smart speakers. As companies build voice assistants with extra …
smartphones and smart speakers. As companies build voice assistants with extra …
Streaming end-to-end speech recognition for mobile devices
End-to-end (E2E) models, which directly predict output character sequences given input
speech, are good candidates for on-device speech recognition. E2E models, however …
speech, are good candidates for on-device speech recognition. E2E models, however …
Natural tts synthesis by conditioning wavenet on mel spectrogram predictions
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …
Tacotron: Towards end-to-end speech synthesis
A text-to-speech synthesis system typically consists of multiple stages, such as a text
analysis frontend, an acoustic model and an audio synthesis module. Building these …
analysis frontend, an acoustic model and an audio synthesis module. Building these …
[PDF][PDF] Wavenet: A generative model for raw audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
[PDF][PDF] Speaker-dependent wavenet vocoder.
In this study, we propose a speaker-dependent WaveNet vocoder, a method of synthesizing
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …
speech waveforms with WaveNet, by utilizing acoustic features from existing vocoder as …
[PDF][PDF] Tacotron: A fully end-to-end text-to-speech synthesis model
ABSTRACT A text-to-speech synthesis system typically consists of multiple stages, such as a
text analysis frontend, an acoustic model and an audio synthesis module. Building these …
text analysis frontend, an acoustic model and an audio synthesis module. Building these …
[PDF][PDF] Shallow-Fusion End-to-End Contextual Biasing.
Contextual biasing to a specific domain, including a user's song names, app names and
contact names, is an important component of any production-level automatic speech …
contact names, is an important component of any production-level automatic speech …
Two-pass end-to-end speech recognition
The requirements for many applications of state-of-the-art speech recognition systems
include not only low word error rate (WER) but also low latency. Specifically, for many use …
include not only low word error rate (WER) but also low latency. Specifically, for many use …