[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
A review of recurrent neural networks: LSTM cells and network architectures
Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019 - direct.mit.edu
Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …
Neural motifs: Scene graph parsing with global context
We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …
[PDF][PDF] Semi-orthogonal low-rank matrix factorization for deep neural networks.
Abstract Time Delay Neural Networks (TDNNs), also known as onedimensional
Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural …
Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural …
Develo** real-time streaming transformer transducer for speech recognition on large-scale dataset
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …
areas including speech recognition. However, compared to LSTM models, the heavy …
Deep learning enabled semantic communications with speech recognition and synthesis
In this paper, we develop a deep learning based semantic communication system for
speech transmission, named DeepSC-ST. We take the speech recognition and speech …
speech transmission, named DeepSC-ST. We take the speech recognition and speech …
Pixel recurrent neural networks
Modeling the distribution of natural images is a landmark problem in unsupervised learning.
This task requires an image model that is at once expressive, tractable and scalable. We …
This task requires an image model that is at once expressive, tractable and scalable. We …
Speech emotion recognition from 3D log-mel spectrograms with deep learning network
H Meng, T Yan, F Yuan, H Wei - IEEE access, 2019 - ieeexplore.ieee.org
Speech emotion recognition is a vital and challenging task that the feature extraction plays a
significant role in the SER performance. With the development of deep learning, we put our …
significant role in the SER performance. With the development of deep learning, we put our …
Training deep nets with sublinear memory cost
We propose a systematic approach to reduce the memory consumption of deep neural
network training. Specifically, we design an algorithm that costs O (sqrt (n)) memory to train …
network training. Specifically, we design an algorithm that costs O (sqrt (n)) memory to train …
Highway networks
There is plenty of theoretical and empirical evidence that depth of neural networks is a
crucial ingredient for their success. However, network training becomes more difficult with …
crucial ingredient for their success. However, network training becomes more difficult with …