A comparative study on transformer vs rnn in speech applications
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the
largest corpus which is suitable for conducting the speech recognition research and building …
largest corpus which is suitable for conducting the speech recognition research and building …
[PDF][PDF] Audio augmentation for speech recognition.
Data augmentation is a common strategy adopted to increase the quantity of training data,
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …
Pattern mining approaches used in sensor-based biometric recognition: a review
Sensing technologies place significant interest in the use of biometrics for the recognition
and assessment of individuals. Pattern mining techniques have established a critical step in …
and assessment of individuals. Pattern mining techniques have established a critical step in …
Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM
T Hori, S Watanabe, Y Zhang, W Chan - ar** ASR systems for new languages, by eliminating the need for linguistic …
UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding
The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens,
has been proven superior to traditional acoustic feature mel-spectrograms in terms of …
has been proven superior to traditional acoustic feature mel-spectrograms in terms of …
Emotion recognition by fusing time synchronous and time asynchronous representations
In this paper, a novel two-branch neural network model structure is proposed for multimodal
emotion recognition, which consists of a time synchronous branch (TSB) and a time …
emotion recognition, which consists of a time synchronous branch (TSB) and a time …