A deep neural network integrated with filterbank learning for speech recognition

H Seki, K Yamamoto… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Deep neural networks (DNN) have achieved significant success in the field of speech
recognition. One of the main advantages of the DNN is automatic feature extraction without …

A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

S Nakagawa, K Iwami, Y Fujii, K Yamamoto - Speech Communication, 2013 - Elsevier
For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-
recognition of spoken words. Consequently, sub-word unit based recognition and retrieval …

Class-based n-gram language model for new words using out-of-vocabulary to in-vocabulary similarity

W Naptali, M Tsuchiya, S Nakagawa - IEICE TRANSACTIONS on …, 2012 - search.ieice.org
Out-of-vocabulary (OOV) words create serious problems for automatic speech recognition
(ASR) systems. Not only are they miss-recognized as in-vocabulary (IV) words with similar …

Topic-Dependent-Class-Based -Gram Language Model

W Naptali, M Tsuchiya… - IEEE transactions on …, 2012 - ieeexplore.ieee.org
A topic-dependent-class (TDC)-based n-gram language model (LM) is a topic-based LM that
employs a semantic extraction method to reveal latent topic information extracted from noun …

[PDF][PDF] Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction.

A Abe, K Yamamoto, S Nakagawa - Interspeech, 2015 - isca-archive.org
Recently, acoustic models based on deep neural notworks (DNNs) have been introduced
and showed dramatic improvements over acoustic models based on GMM in a variety of …

Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

H Seki, K Yamamoto… - … International Conference of …, 2014 - ieeexplore.ieee.org
Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for
Japanese speech recognition. In this paper, we investigate the differences of recognition …

[PDF][PDF] Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides.

S Tsujimura, K Yamamoto, S Nakagawa - INTERSPEECH, 2017 - isca-archive.org
Because of the spread of the Internet in recent years, e-learning, which is a form of learning
through the Internet, has been used in school education. Many lecture videos delivered at …

[PDF][PDF] High speed spoken term detection by combination of n-gram array of a syllable lattice and LVCSR result for NTCIR-SpokenDoc.

K Iwami, S Nakagawa - NTCIR, 2011 - research.nii.ac.jp
For spoken document retrieval, it is very important to consider Out-of-Vocabulary (OOV) and
mis-recognition of spoken words. Therefore, sub-word unit based recognition and retrieval …

Soft-clustering technique for training data in age-and gender-independent speech recognition

D Enami, F Zhu, K Yamamoto… - Proceedings of The …, 2012 - ieeexplore.ieee.org
In this paper, we propose approaches for the Gaussian mixture model (GMM) based soft
clustering of training data and the GMM-or/and hidden Markov model (HMM)-based cluster …

Combination of syllable based N-gram search and word search for spoken term detection through spoken queries and IV/OOV classification

N Sakamoto, K Yamamoto… - 2015 IEEE Workshop on …, 2015 - ieeexplore.ieee.org
This paper presents a Japanese spoken term detection method for spoken queries using a
combination of word-based search and syllable-based N-gram search with in …