Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Fast conformer with linearly scalable attention for efficient speech recognition

D Rekesh, NR Koluguri, S Kriman… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Conformer-based models have become the dominant end-to-end architecture for speech
processing tasks. With the objective of enhancing the conformer architecture for efficient …

NTIRE 2022 challenge on perceptual image quality assessment

J Gu, H Cai, C Dong, JS Ren… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment
(IQA), held in conjunction with the New Trends in Image Restoration and Enhancement …

Audio-visual efficient conformer for robust speech recognition

M Burchi, R Timofte - Proceedings of the IEEE/CVF winter …, 2023 - openaccess.thecvf.com
Abstract End-to-end Automatic Speech Recognition (ASR) systems based on neural
networks have seen large improvements in recent years. The availability of large scale hand …

[HTML][HTML] Automatic Speech Recognition: A survey of deep learning techniques and approaches

H Ahlawat, N Aggarwal, D Gupta - International Journal of Cognitive …, 2025 - Elsevier
Significant research has been conducted during the last decade on the application of
machine learning for speech processing, particularly speech recognition. However, in recent …

Introduction to transformers: an nlp perspective

T **ao, J Zhu - arxiv preprint arxiv:2311.17633, 2023 - arxiv.org
Transformers have dominated empirical machine learning models of natural language
processing. In this paper, we introduce basic concepts of Transformers and present key …

Diagonal state space augmented transformers for speech recognition

G Saon, A Gupta, X Cui - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arxiv preprint arxiv:2212.07850, 2022 - arxiv.org
The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

Joint prediction and denoising for large-scale multilingual self-supervised learning

W Chen, J Shi, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Multilingual self-supervised learning (SSL) has often lagged behind state-of-the-art (SOTA)
methods due to the expenses and complexity required to handle many languages. This …