- Academic Search

Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends

Z Peng, X Li, Z Zhu, M Unoki, J Dang, M Akagi - IEEE Access, 2020 - ieeexplore.ieee.org

Emotion information from speech can effectively help robots understand speaker's intentions
in natural human-robot interaction. The human auditory system can easily track temporal …

Lưu Trích dẫn Trích dẫn 82 bài viết Bài viết có liên quan Tất cả 5 phiên bản

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech

Z Peng, J Dang, M Unoki, M Akagi - Neural Networks, 2021 - Elsevier

Continuous dimensional emotion recognition from speech helps robots or virtual agents
capture the temporal dynamics of a speaker's emotional state in natural human–robot …

Lưu Trích dẫn Trích dẫn 50 bài viết Bài viết có liên quan Tất cả 4 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network

N Li, L Wang, M Ge, M Unoki, S Li, J Dang - Speech Communication, 2024 - Elsevier

Deep learning has revolutionized voice activity detection (VAD) by offering promising
solutions. However, directly applying traditional features, such as raw waveforms and Mel …

Lưu Trích dẫn Trích dẫn 7 bài viết Bài viết có liên quan Tất cả 6 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] jst.go.jp

Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded …

M Unoki, Z Zhu - Acoustical Science and Technology, 2020 - jstage.jst.go.jp

Speech signals can be represented as a sum of amplitude-modulated frequency bands. This
sum can also be regarded as a temporal amplitude envelope (TAE) with temporal fine …

Lưu Trích dẫn Trích dẫn 20 bài viết Bài viết có liên quan Tất cả 3 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function

T Ngo, R Kubo, M Akagi - Speech Communication, 2021 - Elsevier

This study focuses on identifying effective features for controlling speech to increase speech
intelligibility under adverse conditions. Previous approaches either cancel noise throughout …

Lưu Trích dẫn Trích dẫn 11 bài viết Bài viết có liên quan Tất cả 4 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Envelope estimation using geometric properties of a discrete real signal

CHT Santos, V Pereira - Digital Signal Processing, 2022 - Elsevier

Despite being an elusive concept, the temporal amplitude envelope of a signal is essential
for its complete characterization, being the primary information-carrying medium in spoken …

Lưu Trích dẫn Trích dẫn 10 bài viết Bài viết có liên quan Tất cả 4 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] jst.go.jp

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Z Zhu, R Miyauchi, Y Araki, M Unoki - Acoustical Science and …, 2018 - jstage.jst.go.jp

Previous studies on noise-vocoded speech showed that the temporal modulation cues
provided by the temporal envelope play an important role in the perception of vocal emotion …

Lưu Trích dẫn Trích dẫn 17 bài viết Bài viết có liên quan Tất cả 2 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Ability of Human Auditory Perception to Distinguish Human-imitated Speech

K Zaman, K Li, IJAM Samiul, Y Uezu, S Kidani… - IEEE …, 2025 - ieeexplore.ieee.org

Distinguishing human-imitated speech from genuine speech presents a significant
challenge for listeners due to their natural resemblance. Human auditory perception (HAP) …

Lưu Trích dẫn Bài viết có liên quan Tất cả 2 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] jst.go.jp

Study on the perception of nonlinguistic information of noise-vocoded speech under noise and/or reverberation conditions

Z Zhu, M Kawamura, M Unoki - Acoustical Science and Technology, 2022 - jstage.jst.go.jp

It has been known that noise and reverberation greatly affect the perception of linguistic
information, in particular speech intelligibility. However, the effect of noise and reverberation …

Lưu Trích dẫn Trích dẫn 4 bài viết Bài viết có liên quan Tất cả 2 phiên bản

[Free GPT-4]
[DeepSeek]

[PDF] tsukuba.ac.jp

Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

J Santoso, T Yamada, S Makino - 2019 Asia-Pacific Signal and …, 2019 - ieeexplore.ieee.org

In this paper, we address the problem of classifying four common utterance characteristics
related to the utterance speed, which cause speech recognition errors. We previously …

Lưu Trích dẫn Trích dẫn 9 bài viết Bài viết có liên quan Tất cả 8 phiên bản

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Contributions of temporal cue on the perception of speaker individuality and vocal emotion...

Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends

Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech

Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network

Relationship between contributions of temporal amplitude envelope of speech and modulation transfer function in room acoustics to perception of noise-vocoded …

Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function

Envelope estimation using geometric properties of a discrete real signal

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Ability of Human Auditory Perception to Distinguish Human-imitated Speech

Study on the perception of nonlinguistic information of noise-vocoded speech under noise and/or reverberation conditions

Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum