Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends

Z Peng, X Li, Z Zhu, M Unoki, J Dang, M Akagi - IEEE Access, 2020 - ieeexplore.ieee.org
Emotion information from speech can effectively help robots understand speaker's intentions
in natural human-robot interaction. The human auditory system can easily track temporal …

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech

Z Peng, J Dang, M Unoki, M Akagi - Neural Networks, 2021 - Elsevier
Continuous dimensional emotion recognition from speech helps robots or virtual agents
capture the temporal dynamics of a speaker's emotional state in natural human–robot …

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network

N Li, L Wang, M Ge, M Unoki, S Li, J Dang - Speech Communication, 2024 - Elsevier
Deep learning has revolutionized voice activity detection (VAD) by offering promising
solutions. However, directly applying traditional features, such as raw waveforms and Mel …

Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded …

M Unoki, Z Zhu - Acoustical Science and Technology, 2020 - jstage.jst.go.jp
Speech signals can be represented as a sum of amplitude-modulated frequency bands. This
sum can also be regarded as a temporal amplitude envelope (TAE) with temporal fine …

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function

T Ngo, R Kubo, M Akagi - Speech Communication, 2021 - Elsevier
This study focuses on identifying effective features for controlling speech to increase speech
intelligibility under adverse conditions. Previous approaches either cancel noise throughout …

Envelope estimation using geometric properties of a discrete real signal

CHT Santos, V Pereira - Digital Signal Processing, 2022 - Elsevier
Despite being an elusive concept, the temporal amplitude envelope of a signal is essential
for its complete characterization, being the primary information-carrying medium in spoken …

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Z Zhu, R Miyauchi, Y Araki, M Unoki - Acoustical Science and …, 2018 - jstage.jst.go.jp
Previous studies on noise-vocoded speech showed that the temporal modulation cues
provided by the temporal envelope play an important role in the perception of vocal emotion …

Ability of Human Auditory Perception to Distinguish Human-imitated Speech

K Zaman, K Li, IJAM Samiul, Y Uezu, S Kidani… - IEEE …, 2025 - ieeexplore.ieee.org
Distinguishing human-imitated speech from genuine speech presents a significant
challenge for listeners due to their natural resemblance. Human auditory perception (HAP) …

Study on the perception of nonlinguistic information of noise-vocoded speech under noise and/or reverberation conditions

Z Zhu, M Kawamura, M Unoki - Acoustical Science and Technology, 2022 - jstage.jst.go.jp
It has been known that noise and reverberation greatly affect the perception of linguistic
information, in particular speech intelligibility. However, the effect of noise and reverberation …

Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

J Santoso, T Yamada, S Makino - 2019 Asia-Pacific Signal and …, 2019 - ieeexplore.ieee.org
In this paper, we address the problem of classifying four common utterance characteristics
related to the utterance speed, which cause speech recognition errors. We previously …