[HTML][HTML] A survey of transformers
Transformers have achieved great success in many artificial intelligence fields, such as
natural language processing, computer vision, and audio processing. Therefore, it is natural …
natural language processing, computer vision, and audio processing. Therefore, it is natural …
Recent developments on espnet toolkit boosted by conformer
In this study, we present recent developments on ESPnet: End-to-End Speech Processing
toolkit, which mainly involves a recently proposed architecture called Conformer …
toolkit, which mainly involves a recently proposed architecture called Conformer …
Conformer: Convolution-augmented transformer for speech recognition
Recently Transformer and Convolution neural network (CNN) based models have shown
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …
promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural …
On layer normalization in the transformer architecture
The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …
however, one usually needs a carefully designed learning rate warm-up stage, which is …
Squeezeformer: An efficient transformer for automatic speech recognition
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …
various downstream speech tasks based on its hybrid attention-convolution architecture that …
Understanding the difficulty of training transformers
Transformers have proved effective in many NLP tasks. However, their training requires non-
trivial efforts regarding designing cutting-edge optimizers and learning rate schedulers …
trivial efforts regarding designing cutting-edge optimizers and learning rate schedulers …
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
The emergence of clusters in self-attention dynamics
B Geshkovski, C Letrouit… - Advances in Neural …, 2024 - proceedings.neurips.cc
Viewing Transformers as interacting particle systems, we describe the geometry of learned
representations when the weights are not time-dependent. We show that particles …
representations when the weights are not time-dependent. We show that particles …
Energy transformer
Our work combines aspects of three promising paradigms in machine learning, namely,
attention mechanism, energy-based models, and associative memory. Attention is the power …
attention mechanism, energy-based models, and associative memory. Attention is the power …
[PDF][PDF] Nasvit: Neural architecture search for efficient vision transformers with gradient conflict-aware supernet training
Designing accurate and efficient vision transformers (ViTs) is an important but challenging
task. Supernet-based one-shot neural architecture search (NAS) enables fast architecture …
task. Supernet-based one-shot neural architecture search (NAS) enables fast architecture …