[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

A review of recurrent neural networks: LSTM cells and network architectures

Y Yu, X Si, C Hu, J Zhang - Neural computation, 2019 - direct.mit.edu
Recurrent neural networks (RNNs) have been widely adopted in research areas concerned
with sequential data, such as text, audio, and video. However, RNNs consisting of sigma …

Neural motifs: Scene graph parsing with global context

R Zellers, M Yatskar, S Thomson… - Proceedings of the …, 2018 - openaccess.thecvf.com
We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …

[PDF][PDF] Semi-orthogonal low-rank matrix factorization for deep neural networks.

D Povey, G Cheng, Y Wang, K Li, H Xu… - Interspeech, 2018 - academia.edu
Abstract Time Delay Neural Networks (TDNNs), also known as onedimensional
Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural …

Develo** real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …

Deep learning enabled semantic communications with speech recognition and synthesis

Z Weng, Z Qin, X Tao, C Pan, G Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this paper, we develop a deep learning based semantic communication system for
speech transmission, named DeepSC-ST. We take the speech recognition and speech …

Pixel recurrent neural networks

A Van Den Oord, N Kalchbrenner… - … on machine learning, 2016 - proceedings.mlr.press
Modeling the distribution of natural images is a landmark problem in unsupervised learning.
This task requires an image model that is at once expressive, tractable and scalable. We …

Speech emotion recognition from 3D log-mel spectrograms with deep learning network

H Meng, T Yan, F Yuan, H Wei - IEEE access, 2019 - ieeexplore.ieee.org
Speech emotion recognition is a vital and challenging task that the feature extraction plays a
significant role in the SER performance. With the development of deep learning, we put our …

Training deep nets with sublinear memory cost

T Chen, B Xu, C Zhang, C Guestrin - arxiv preprint arxiv:1604.06174, 2016 - arxiv.org
We propose a systematic approach to reduce the memory consumption of deep neural
network training. Specifically, we design an algorithm that costs O (sqrt (n)) memory to train …

Highway networks

RK Srivastava, K Greff, J Schmidhuber - arxiv preprint arxiv:1505.00387, 2015 - arxiv.org
There is plenty of theoretical and empirical evidence that depth of neural networks is a
crucial ingredient for their success. However, network training becomes more difficult with …