[HTML][HTML] A survey of transformers

T Lin, Y Wang, X Liu, X Qiu - AI open, 2022 - Elsevier
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …

Recent developments on espnet toolkit boosted by conformer

P Guo, F Boyer, X Chang, T Hayashi… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments on ESPnet: End-to-End Speech Processing
toolkit, which mainly involves a recently proposed architecture called Conformer …

Conformer: Convolution-augmented transformer for speech recognition

A Gulati, J Qin, CC Chiu, N Parmar, Y Zhang… - arxiv preprint arxiv …, 2020 - arxiv.org
Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …

On layer normalization in the transformer architecture

R **ong, Y Yang, D He, K Zheng… - International …, 2020 - proceedings.mlr.press
The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …

Squeezeformer: An efficient transformer for automatic speech recognition

S Kim, A Gholami, A Shaw, N Lee… - Advances in …, 2022 - proceedings.neurips.cc
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …

Understanding the difficulty of training transformers

L Liu, X Liu, J Gao, W Chen, J Han - arxiv preprint arxiv:2004.08249, 2020 - arxiv.org
Transformers have proved effective in many NLP tasks. However, their training requires non-
trivial efforts regarding designing cutting-edge optimizers and learning rate schedulers …

Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

The emergence of clusters in self-attention dynamics

B Geshkovski, C Letrouit… - Advances in Neural …, 2024 - proceedings.neurips.cc
Viewing Transformers as interacting particle systems, we describe the geometry of learned
representations when the weights are not time-dependent. We show that particles …

Energy transformer

B Hoover, Y Liang, B Pham, R Panda… - Advances in …, 2024 - proceedings.neurips.cc
Our work combines aspects of three promising paradigms in machine learning, namely,
attention mechanism, energy-based models, and associative memory. Attention is the power …

[PDF][PDF] Nasvit: Neural architecture search for efficient vision transformers with gradient conflict-aware supernet training

C Gong, D Wang - ICLR Proceedings 2022, 2022 - par.nsf.gov
Designing accurate and efficient vision transformers (ViTs) is an important but challenging
task. Supernet-based one-shot neural architecture search (NAS) enables fast architecture …