A comparative study on transformer vs rnn in speech applications

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline

H Bu, J Du, X Na, B Wu, H Zheng - … of the oriental chapter of the …, 2017 - ieeexplore.ieee.org
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the
largest corpus which is suitable for conducting the speech recognition research and building …

[PDF][PDF] Audio augmentation for speech recognition.

T Ko, V Peddinti, D Povey, S Khudanpur - Interspeech, 2015 - isca-archive.org
Data augmentation is a common strategy adopted to increase the quantity of training data,
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …

Pattern mining approaches used in sensor-based biometric recognition: a review

J Chaki, N Dey, F Shi, RS Sherratt - IEEE Sensors Journal, 2019 - ieeexplore.ieee.org
Sensing technologies place significant interest in the use of biometrics for the recognition
and assessment of individuals. Pattern mining techniques have established a critical step in …

UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding

C Du, Y Guo, F Shen, Z Liu, Z Liang, X Chen… - Proceedings of the …, 2024 - ojs.aaai.org
The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens,
has been proven superior to traditional acoustic feature mel-spectrograms in terms of …

Emotion recognition by fusing time synchronous and time asynchronous representations

W Wu, C Zhang, PC Woodland - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
In this paper, a novel two-branch neural network model structure is proposed for multimodal
emotion recognition, which consists of a time synchronous branch (TSB) and a time …