E-branchformer: Branchformer with enhanced merging for speech recognition

K Kim, F Wu, Y Peng, J Pan, P Sridhar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies

F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …

Transformer-based end-to-end speech recognition with local dense synthesizer attention

M Xu, S Li, XL Zhang - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
Recently, several studies reported that dot-product self-attention (SA) may not be
indispensable to the state-of-the-art Transformer models. Motivated by the fact that dense …

Efficient conformer-based speech recognition with linear attention

S Li, M Xu, XL Zhang - 2021 Asia-Pacific Signal and …, 2021 - ieeexplore.ieee.org
Recently, conformer-based end-to-end automatic speech recognition, which outperforms
recurrent neural network based ones, has received much attention. Although the parallel …

Speech dereverberation with frequency domain autoregressive modeling

A Purushothaman, D Dutta, R Kumar… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Speech applications in far-field real world settings often deal with signals that are corrupted
by reverberation. The task of dereverberation constitutes an important step to improve the …

[HTML][HTML] Event specific attention for polyphonic sound event detection

H Sundar, M Sun, C Wang - 2021 - amazon.science
The concept of multi-headed self attention (MHSA) introduced as a critical building block of a
Transformer Encoder/Decoder Module has made a significant impact in the areas of natural …

Towards efficient 3D human motion prediction using deformable transformer-based adversarial network

Y Hua, F Xuanzhe, H Yaqing, L Yi, K Cai… - … on Robotics and …, 2022 - ieeexplore.ieee.org
Human motion prediction is a crucial step for achieving human-robot interactions. While
recent transformer-based methods have shown great potentials in 3D human motion …

Conformer-based end-to-end speech recognition with rotary position embedding

S Li, M Xu, XL Zhang - 2021 Asia-Pacific Signal and …, 2021 - ieeexplore.ieee.org
Transformer-based end-to-end speech recognition models have received considerable
attention in recent years due to their high training speed and ability to model a long-range …

A lightweight dynamic filter for keyword spotting

D Kim, K Ko, J Kwak, DK Han… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Keyword Spotting (KWS) from speech signals is widely applied to perform fully hands-free
speech recognition. The KWS network is designed as a small-footprint model so it can …