A review of paralinguistic information processing for natural speech communication

Y Yamashita - Acoustical Science and Technology, 2013 - jstage.jst.go.jp
Speech conveys not only linguistic information but also supplemental information that is not
inferable from written language, such as attitude, speaking style, intention, emotion, mental …

Controllable emphatic speech synthesis based on forward attention for expressive speech synthesis

L Liu, J Hu, Z Wu, S Yang, S Yang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
In speech interaction scenarios, speech emphasis is essential for expressing the underlying
intention and attitude. Recently, end-to-end emphatic speech synthesis greatly improves the …

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data

Y Ning, Z Wu, R Li, J Jia, M Xu… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) has
achieved state-of-the-art performance in many sequence processing problems given its …

Emphatic speech generation with conditioned input layer and bidirectional LSTMS for expressive speech synthesis

R Li, Z Wu, Y Huang, J Jia, H Meng… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
By highlighting the focus of an utterance to draw attention, emphasis in speech interaction
plays an important role for speaker intention expressing and understanding. Therefore …

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training

F Meng, Z Wu, J Jia, H Meng, L Cai - Multimedia tools and applications, 2014 - Springer
Emphasis plays an important role in expressive speech synthesis in highlighting the focus of
an utterance to draw the attention of the listener. We present a hidden Markov model (HMM) …

Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis

Y Maeno, T Nose, T Kobayashi, T Koriyama… - Speech …, 2014 - Elsevier
This paper proposes an unsupervised labeling technique using phrase-level prosodic
contexts for HMM-based expressive speech synthesis, which enables users to manually …

Generating emphatic speech with hidden Markov model for expressive speech synthesis

Z Wu, Y Ning, X Zang, J Jia, F Meng, H Meng… - Multimedia tools and …, 2015 - Springer
Emphasis plays an important role in expressive speech synthesis in highlighting the focus of
an utterance to draw the attention of the listener. As there are only a few emphasized words …

HMM-based Thai speech synthesis using unsupervised stress context labeling

D Moungsri, T Koriyama… - Signal and Information …, 2014 - ieeexplore.ieee.org
This paper describes an approach to HMM-based Thai speech synthesis using stress
context. It has been shown that context related to stressed/unstressed syllable information …

[PDF][PDF] Using tilt for automatic emphasis detection with Bayesian networks.

Y Ning, Z Wu, X Lou, HM Meng, J Jia, L Cai - INTERSPEECH, 2015 - se.cuhk.edu.hk
This paper proposes a new framework for emphasis detection from natural speech, where
emphasis refers to a word or part of a word perceived as standing out from its surrounding …

Statistical model training technique based on speaker clustering approach for HMM-based speech synthesis

Y Ijima, N Miyazaki, H Mizuno, S Sakauchi - Speech Communication, 2015 - Elsevier
This paper proposes an average voice model training technique based on a speaker
clustering approach to generate synthetic speech with enhanced similarity to the target …